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Several research thrusts in the area of data management have focused on understanding how changes in 
the data affect the output of a view or standing query. Example applications are explaining query results, 
propagating updates through views, and anonymizing datasets. These applications usually rely on under¬ 
standing how interventions in a database impact the output of a query. An important aspect of this analysis 
is the problem of deleting a minimum number of tuples from the input tables to make a given Boolean query 
false. We refer to this problem as “the resilience of a query” and show its connections to the well-studied 
problems of deletion propagation and causal responsibility. In this paper, we study the complexity of re¬ 
silience for self-join-free conjunctive queries, and also make several contributions to previous known results 
for the problems of deletion propagation with source side-effects and causal responsibility. (1) We define the 
notion of resilience and provide a complete dichotomy for the class of self-join-free conjunctive queries with 
arbitrary functional dependencies', this dichotomy also extends and generalizes previous tractability results 
on deletion propagation with source side-effects. (2) We formalize the connection between resilience and 
causal responsibility, and show that resilience has a larger class of tractable queries than responsibility. 
(3) We identify a mistake in a previous dichotomy for the problem of causal responsibility and offer a revised 
characterization based on new, simpler, and more intuitive notions. (4) Finally, we extend the dichotomy for 
causal responsibility in two ways: (a) we treat cases where the input tables contain functional dependencies, 
and (b) we compute responsibility for a set of tuples specified via wildcards. 


1. INTRODUCTION 

As data continues to grow in volume, the results of relational queries become harder 
to understand, interpret, and debug through manual inspection. Data management 
research has recognized this fundamental need to derive explanations for query results 
and explanations for surp rising observations. Existing work has defined explanations 
as predicates i n a query | Wu and Madden 2013} |Roy and Suciu 2014HChapman and 
Jagadish 200 9), or as modi fications to the input data [Melio u~et al. 2010[ Huang et al. 
2008 Hersch el et al. 2009). In the latter categor y, the metric of causal responsibility, 

dei 


first introduced by |Chockler and Halpern [2004| , quantifies the contribution of an in¬ 
put tuple to a particular output. One can then derive explanations by ranking input 
tuples using their responsibilities: tuples with high degree of responsibility are better 
explanations for a particular query result than tuples with low responsibility (Meliou 
etal. 20101. 

A seemingl y unrelated not ion, the concept of deletion propagation with source side- 
effects [Buneman et al. 2002], seeks a minimum set of tuples in the input tables that 
should be deleted from the database in order to delete a particular tuple from a query. 
Query results that have a larger set of tuples that need to be deleted are more reliable 
or more “robust” to changes in the input database than others. This measure of relative 
importance can provide another type of explanation and allows us to rank the output 
tuples by their relative robustness. 

In this paper, we take a step back and re-examine how particular interventions (tu¬ 
ple deletions in the input of a query) impact its output. Specifically, we study how “re¬ 
silient” a Boolean query is with respect to such interventions. Resilience identifies the 
smallest number of tuples to delete from the input to make the query false. We will 
show that characterizing the complexity of this problem also allows us to study the 
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(a) Source-side effects: min |F| 
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(c) Responsibility: min |T| 
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(d) View-side effects: min | A| 


SJ : Queries with selections and joins 

PTIME 

I Buneman et al. 20021 

PJ : Queries with projections and joins 
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PTIME 
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(e) Source-side effect problem: prior and our dichotomy results 
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(f) View-side effect problem: prior dichotomy results 


Fig. 1: This paper contains dichotomy results for (a) deletion propagation with source- 
side effects, (b) resilience, and (c) responsibility for causality. Besides others, they im¬ 
ply a complete dichotomy for the source side-effect problem for the class of self-join-free 
conjunctive queries in the pres ence of functional depende ncies (e). Thus, th is part of our 
work is similar in scope to [Kimelfeld et al. 2012] and [Kimelfeld 2012] for the prob¬ 
lem of view-side effects (f). We derive these results by analyzing a simpler concept: 
the resilience of Boolean queries. In addition (not shown in the figure), we provide a 
correction to a prior dichotomy result for causal responsibility and then extend it in 
two ways: responsibility for tables with functional dependencies and responsibility for 
tuples with wildcards, e.g., S( *, 5,7). 
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complexities of both deletion propagation with source side-effects and causal responsi¬ 
bility with minor modifications. 


Deletion propagation and existing results. Databases allow users to interact with 
data through views, which are often conjunctive queries. Views can be used to sim¬ 
plify complex queries, enforce access control policies, and preserve data independence 
for external applications. Of particular interest is how deletions in the input data af¬ 
fect the view (which is a trivial problem), but also how deletions in the view could be 
achieved by appropriately chosen deletions in the i nput data (which is far less triv¬ 
ial). Concretel y, the problem of deletion propagation [Buneman et al. 2002; Dayal and 
Bernstei n 1982| seeks a set I’ of tuples in the input tables that should be deleted from 
the database in order to delete a particular tuple from the view. Intuitively, this dele¬ 
tion should be achieved with minimal side-effects, where side-effects are defined with 
either of two objectives: (a) deletion propagation with source side-effects (DP source ) seeks 
a minimum set of input tuples T in order to delete a given output tuple; whereas (b) 
deletion propagation with view side-effects (DP vle „) seeks a set of input tuples I that 
results in a minimum number of output tuple deletions in the view, other than the 
tuple of interest [Buneman et al. 20021. 

Example 1.1 ( Source & View side effects). Consider the query 

q(x , u)R(x, y),S(y, z, w), T(w , u) 


defining a view over the database R , S , T shown below. To delete tuple v\ from the 
resulting view with minimum source side-effects, one only needs to remove tuple t± 
fro m the da tabase. Therefore, the optimal solution to DP source is T = {L} with |T| = 1 
(see Fig, la l. 

However, the deletion of t\ also removes v 2 , which is a view side-effect: A = { v 2 } 
with |A| = 1. The optimal solution to DP view , which minimizes the side-effects on the 
view (set A) is the set of input tuples r = {ri,r 2 }: deleting these two tuples removes 
only vi fro m the vi ew but not v 2 , and thus has no view-side effects, i.e., A = 0 with 
|A| = 0 (see|Fig. ld|). 
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Known complexity results. Buneman et al. [20021 showed that both variants are 


in general NP-complete for conjunctive queries containing projections and joins (PJ), 
whereas they are in PTIME for queries containing only selections and joins (SJ). Later, 
Cong et al. [20121 identified a class of PJ queries, called “key-preserving,” for which 
both probl em variants c an be solved in PTIME. According to these two results, the 
query from Example l.l|falls into the general class of NP-complete queries. 

In addition, [Kimelfeld et al. [2012) provided a more refined dichotomy result for 
the problem of minimal view side-effects for self-join-free conjunctive queries (CQs). 
This dichotomy leads to more polynomial time cases, as it characterizes the complexity 
based on a property of the query structure (using the property of “head domination”), 
rather than high-level dat abase operators (e.g., projections and joins). For example, 
the query of Example 1.1 is not head-dominated, which means that DP view is indeed 
NP-complete for that query. Later work has also extended the dichotomy result to self- 
join-free CQs with functional dependencies (FDs) [Kimelfeld 20121. 
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Causal responsibility and existing results. The problem of causal responsibil¬ 
ity I Meliou et al. 20101 seeks, for a given query and a specified input tuple, a minimum 
set of other input tuples I that, if deleted would make the tuple of interest “counter- 
factual,” i.e., the query would be true with that tuple present, or false if the tuple was 
also deleted. Both problems of resilience and of causal responsibility rely on the notion 
of minimal interventions in the input database and are thus closely related. However, 
we will show that resilience is easier (has lower complexity) than responsibility, and 
provide extensive discussion of the connections among all these related problems. 

Example 1 .2 (Resilience & Causal responsibility). Consider again the query from 
Example 1.1 and the output tuple v\ = (1,9). Applying the substitution [(a;, it)/(l, 9)], 
i.e., substituting the variables x and u with 1 and 9, respectively, we get a 
query < 7 ( 1 ,9) R(l, y), S(y, z, w), T(w, 9). The solution to DP source for q and tuple 
v\ is then equivalent to the solution of the resilience problem over the Boolean 
query q'R'(y),S(y,z,w),T'(w) over the database R\S,T' with R'(y)R(l,y) and 
T'(w ) :—T(w, 9) shown below. The answer to the resilience p roblem for q' is Y {t\} 
with |T| = 1: deleting tuple t\ makes the query false (also see Fig, lb). 
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The causal responsibility problem requires a tuple in the lineage of the query as 
additional input. For example, the responsibility of tuple si in query q' corresponds 
to the contingency set T = {s 2 ,s 3 } with |T| = 2. Deleting these two tuples makes si 
a counte rfactual cause for q', i.e., the query is true if si is present or false, otherwise 
(also see Fig. Tcl i. 

Known complexity results. Meliou et al. [20101 showed that causality of a given tu¬ 
ple can be computed in polynomial time for any conjunctive query. Further, that work 
presented a dichotomy result for computing causal responsibility for self-join-free con¬ 
junctive queries, based on a characterization of a query property called weak linearity. 
However, in this work, we identify an error in the existing dichotomy which classified 
certain hard queries into the polynomial class of queries. In particular, we found that 
the existing notion of “domination” is not sufficient to characterize the dichotomy and 
we provide here a refinement of domination called “full domination” that together with 
a new concept of “triads” solves this issue. 


Contributions of our work. In this paper, we study the problem of minimal inter¬ 
ventions with respect to a new notion called resilience of a Boolean query, which is a 
minimum number of input tuples that need to be deleted in order to make the query 
false. A method that provides a solution to resilience can immediately also provide an 
answer to the deletion propagation with source-side effects problem by defining a new 
Boolean query and database, replacing all head variables in the view with constants 
of the output tuple. We define our results in terms of “resilience” since the notion of 
resilience has obvious analogies to universally known minimal set cover problems. At 
the same time, our complexity results on resilience also allow us to study the problem 
of causal responsibility. We thus state our contributions with respect to both deletion 
propagation and causal responsibility. 

(1) Contributions to deletion propagation. Our results on resilience imply a 
refinement for the complexity of minimum source side-effects by defining a novel, yet 
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simple and intuitive property of the query structure called “triads” For the class of self- 
join-free conjunctive queries, we show that resilience is NP-complete if the query con¬ 
tains this structure, and PTIME otherwise ( [Section ~3] l. Determining whether a query 
contains a triad can be done very efficiently, in polynomial time with respect to query 
complexity. T his implies that DP source can always be solved in PTIME for the query of 
|Example 1.1[ These results are analogous to the results of |Kimelfeld et al. [2012) for 
the view-side effect problem. In addition, our dichotomy criterion also allows the speci¬ 
fication of “forbidden” tables (called exogenous tables) that do not allow deletions. This 
is an extension to the traditional definition of the deletion propagation problem and 
affects the complexity of queries in non-obvious ways (defining a table as exogenous 
can make both easy queries hard, and hard queries easy). 

Our work also provides a co mplete dicho tomy result for the class of self-join-free CQs 
with Functional Dependencies ( Section 4) . These results are analogous to the results 
of Kimelfeld [2012) for the view-side effect problem. At a high-level, we define rewrite 
steps that are induced by the functional dependencies, and check the resulting query 
for the presence of triads. 

In particular, our dichotomy result on the resilience of a Boolean conjunctive query 
provides new tractable solutions to the otherwise hard minimum hypergraph vertex 
cover problem. Our PTIME classes for resilience define families of hypergraphs for 
which minimum vertex cover is also always in PTIME. As such, resilience provides 
an intuitive definition that can draw analogies to problems even outside the database 
community. However, these implications are outside the scope of this paper. 

(2) Contributions to causal responsibility. We show that responsibility is a more 
fine-grained notion than re silience, resulting in higher comp lexity. In p articular, we 
show query q Tats in Fig. 2b for which resilience is in PTIME ( Cor. 3.22} , whereas re¬ 
sponsibility is NP-complete ( |Prop. 5.lj ). The benefit of responsibility is that it allows 
us to rank input tuples based on their impact to a query, thus making it applicable 
to settings where this ranking is important, such as providing explanations and data 
compression (by compressing data with small contributions to an output). In |Section 7[ 
we discuss ways to use resilience in these applications, and thus benefit from its re¬ 
duced complexity compared to responsibility. 

In addition, we found that responsibility is a more subtle concept than we previously 
thought. In particular, we identified an error in the existing dichotomy for responsi¬ 
bility [Meliou et al. 2010] which classified certain hard queries into the polynomial 
class of queries. In particular, we found th at the exist ing notion of “domination” is not 
sufficient to characterize the dichotomy. In Section 5 we provide a refinement of dom¬ 
ination called “full domination” that helps use solve this issue. In addition, our new 
results provide two significant extensions to the previous dichotomy: (a) We generalize 
the notion of responsibility from simple tuples to tuples with wildcards, (b) We show 
that through a process of query rewrites, our dichotomy results continue to hold in the 
presence of functional dependencies over the input relations. 


Outline. Section 2 defines all notions mentioned here more formally and discusses 


the con nections of resilience with deletion propagation and causal responsibility. Sec¬ 
tions 3 and|4|con tain our two main technical contributions for the problem of resilience, 


while Section 5 corrects the dichotomy of responsi bility and extends it to the case of tu¬ 
ples with wildcards and functional dependencies. Section 6 reviews additional related 
work, and|Section 7| discusses implications, open problems, and future directions. 


2. FORMAL SETUP AND CONNECTIONS 

This section introduces our notation, defines resilience, and formalizes the connections 
between the problems of resilience, deletion propagation, and causal responsibility. 
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General notations. We use boldface (e.g., x = {x \..... xf)) to denote tuples or 
ordered sets. A self-join-free conjunctive query (sj-free CQ) is a first-order formula 
q{ y) = 3x {Ai A ... A A m ) where the variables x = (* 1 , ..., x k ) are called existential 
variables, y = (yi,... ,y c ) are called the head variables (or free variables), and each 
atom Ai represents a relation Rife) where Zi CxUyQ 

The term “self-join-free” means that no relation symbol occurs more than once. We 
write var(.Aj) for the set of variables occurring in atom A r The database instance is 
then the union of all tuples in the relations D = (J ?: R, . As usual, we abbreviate the 
query in Datalog notation by q{ y): - A 1 , A m . For tuple t, we write D \= q\ t/y] to 
denote that t is in the query result of the non-Boolean query q{ y) over database D. 
The set of query results over database D is denoted by q{y) D ■ 

Unless otherwise stated, a query in this paper denotes a sj-free Boolean conjunctive 
query q (i.e., y = 0). Because we only have sj-free CQ we do not have two atoms refer¬ 
ring to the same relation, so we may refer to atoms and relations interchangeably. We 
write D \= q to denote that the query q evaluates to true over the database instance 
D, and D \f= q to denote that q evaluates to false. We call a valuation of all existen¬ 
tial variables that is permitted by D and that makes q true, a witness wjnThe set of 
witnesses of D \= 3x (A\ A ... A A m ) is the set {w | D |= (A\ A ... A A m )[w/x]}. 

A database instance may contain some “forbidden” tuples that may not be deleted. 
Since we are interested in the data complexity of resilience, we specify at the query 
level which tables contain tuples that may or may not be deleted. Those atoms from 
which tuples may not be deleted are called exogenou^\ and we write these atoms or 
relations with a superscript “x”. The other atoms, whose tuples may be deleted, are 
called endogenous. We may occasionally attach the superscript “n” to an atom to em¬ 
phasize that it is endogenous. Moreover, we can refer to a database as a partition of its 
tables into its exogenous and endogenous parts, D = D x U D n . 

2.1. Query resilience 

In this paper, we focus on determining the resilience of a query with regard to changes 
in D n . Given D \= q, our motivating question is: what is the minimum number of tuples 
to remove in order to make the query false? 

Definition 2.1 {Resilience). Given a query q and database D, we say that {D, k) G 
RES(q) if and only if D \= q and there exists some F C D" such that D — T \f= q and 

|r| < k. 

In other words, ( D , k) G RES(q) means that there is a set of k or fewer tuples in the 
endogenous tables of D, the removal of which makes the query false. Observe that 
since q is computable in PTIME, RES (7/) g NP. We will see that there is a dichotomy for 
all sj-free con junctive querie s: for all such queries q, either RES(g) G PTIME or RES (5) is 
NP-complete ( jTheorem 3.24| l. We are naturally interested in the optimization version 
of this decision problem: given q and D, find the minimum k so that ( D , k) G RES(g). 
A larger k implies that the query is more “ resilient ” and requires the deletion of more 
tuples to change the query output. 


1 We assume w.l.o.g. that z; is a tuple of only variables without constants. This is so, because for any constant 
in the query, we can first apply a selection on each table and then consider the modified query with a column 
removed (see the transformation from resilience to source side-effects for details). 

2 Notice that our notion of witness slightly differs from the one commonly seen in provenance literature 
where a “witness” refers to a subset of the input_database records that is sufficient to ensure that a given 
output tuple appears in the result of a query IjCheney et al. 2009J. 

3 In other words, tuples in these atoms provide context and are outside the scope of possible “interventions” 
in the spirit of causality |Halpern and Pearl 2005) . 
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In this paper, we focus on Boolean queries, however we can also define the resilience 
problem for non-Boolean queries as follows: 

Definition 2.2 (. Resilience for non-Boolean queries). Given non-Boolean query q(y) 
and database D, we say that (D, k ) £ RES(g(y)) if and only if q(y) D ^ 0 and there exists 
some T C D a such that q(y) D ~ r = 0 and |T| < k. 

It is clear from the definition that we are interested in eliminating all the output tuples 
from the query result, and it is easy to see that RES(g(y)) = RES(q'), where q' is obtained 
by removing all variables y from the head of q, turning them into existential variables. 

We can refine this definition to include a target tuple t, i.e., instead of deleting all 
output tuples from the query result, we would like to delete only one output tuple t. 
As we saw in the introduction, this is the exact definition of the deletion propagation 
problem. The next subsection will make the correspondence between resilience and 
deletion propagation with source side-effects precise. 


2.2. Deletion propagation: source side-effects 

Deletion propagation in view updates gen erally refers to non-Boolean queries 


q(y)A \,..., A m . We next define the problem [Buneman et al. 2002; Dayal and Bern¬ 
stein 1982J formally in our notation: 


Definition 2.3 ( Source side-effects). Given a query q( y), database D, and an output 
tuple t, we say that (D,t,k) £ DP source (g(y)) if and only if t £ q(y) D and there exists 
some rcfl such that t q( y) D ~ r and |T| < k. 


It is easy to see that there is a homomorphism between resilience and the source- 
side effect va riant of deletion propagation. We have illustrated this correspondence in 
|Example 1.2 and next describe this transformation more formally. 

Given a conjunctive query q(y): A t ,, A rn and a tuple t = c in the output q( y) D . 
We first obtain a Boolean query q' by deleting the head variables in q{y). Then we 
modify the database by applying a filter (selection): for each relation Rj(zj) we define 
a new relation f?((xj):— R,(0 f (;/,,)) with x, being the existential variables that occur in 
Ri, and where the substitution 9 t : y —> c replaces the former head variables with 
the corresponding constants from t and keep the existential variables as they are. For 


example, R'(y)R(l,y) in|Example 1.2|(see|Fig. 
new database D' = [J ; R' t and a new Boolean query q' 
if Aj = Rifif), for which the following holds |^| 


la and Fig. 1 

T._ A> - 2 - X 

71 1J • • • ? 


p- 


This will lead to a 
, where A' = 


Corollary 2.4 (Resilience & Source side-effects). Given a query q(y), 
database D, and output tuple t £ q{y) D , let q' and D' be the new Boolean query 
and new database instance obtained by the above transformation. Then: (D , t, k) £ 
DPso U rce(g(y)) (£>', k) £ RES (q 1 ). 


Notice that the same transformation can be used to treat constants in a CQ when 
considering source side-effects. Thus, by solving the complexity of resilience, we im¬ 
mediately also solve the problem of deletion propagation with source side-effects. We 
prefer to present our results using the notion of resilience, as there are several applica¬ 
tions beyond view updates that relate to these problems. Examples include robustness 
of network connectivity (identifying sets of nodes and edges that could disconnect a 
network), deriving explanations for query results (finding the lineage tuples that have 
most impact to an output), and problems related to set cover. We proceed to discuss 


4 An informal way to describe this transformation of D at the query level is to first only keep tuples in the 
lineage of t and to then delete all columns in atoms that contain constants from c). 
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existing results on the complexity of deletion propagation with source side-effects, and 
explain how our results on the complexity of resilience extend this prior work. 
Buneman et al. [2002] define a dichotomy for the hardness of DP source (q) based only 


on the operations that occur in q, namely, selection, projection, join, union. Specifically, 
they show that DP source (g(y)) is NP-complete for PJ and JU queries (i.e., queries in¬ 
volving projections and joins, or queries involving joins and unions), while it is PTIME 
for SJ and SPU queries (i.e., queries involving selections and joins, or queries involv¬ 
ing selections, projections, and unions only). Later, |Cong et al. [2012| showed that 


DPsource(<z(y)) is in PTIME for a SPJ query if all primary keys of the involved relations 
appear in the head variables y (a condition called “key preservation”). Notice that the 
concept of key preservation does not apply to the problem of resilience, as keys are 
never preserved in Boolean queries. 

In this paper, we identify a larger class of SPJ queries for which the problem of re¬ 
silience — and thus DP source (c/(y)) — is in PTIME, thus extending all prior results. In 


Section 3| we provide a dichotomy result based on identifying a specific and very intu¬ 
itive structure in a query, called a triad: queries that contain a triad are NP-complete, 
whereas those that do not are in PTIME. Our results refine the prior work in the sense 
that prior results characterize the dichotomy at the level of operators used in the query 
(e.g., joins, projections), while our result identifies all polynomial cases based on (i) the 
actual que ry and (ii) additional schema knowledge of forbidden, “exogenous” tables. In 
Section 4} we extend our results to even include (Hi) functional dependencies. 


2.3. Deletion propagation: view side-effects 

The problem of deletion propagation with view side-effects has a different objective 
than resilience: it attempts to minimize the changes in the view rather than the source. 

Definition 2.5 (View side-effects). Given a query q( y), a database D, and a tuple t 
in the view, we say that (D,t, k) £ DP view (g(y)) if and only if t £ q( y) D and there exists 
some T C D such that t (£ q(y)°~ r , and |A| < k, where A = (q( y) D - {q(y) D ~ r U {t})). 
In other words, A is the set of tuples other than t that were eliminated from the view. 


The dichotomy results from Buneman et al. [2002] extend to the case of DP vlew (?), 
and th e same is true for key preservation ||Cong et al. 2012| . Later, Kimelfeld et al. 
[2012 | refined the dichotomy for the view side-effect problem by providing a character- 
ization that uses the query structure: DP view (V/(y)) is PTIME for queries that are head 
dominated, and NP-complete otherwise. Head domination checks for the components 
of the query that are connected by the existential variables, where all head variables 
contained in the atoms of that component appear in a single atom in the query. Our 
work in this paper offers a similar refinement for the dichotomy of DP source (g(y)) from 
the characterization at the operator level to the characterization at the level of query 
structure, plus knowledge of exogenous (“forbidden”) tables. 


Functional dependencies. Kimelfeld [2012] augmented the dichotomy on DP view (g) 
for cases where functional dependencies (FDs) hold over the data instance D. The 
tractability condition for this case checks whether the query has functional head dom¬ 
ination, which is an extension of the notion of head domination. We provide similar 
extensions in this paper for the problem of DP source (g(y)): our dichotomy for the case 
of FDs checks for triads after the query is structurally manipulated through a process 
we call induced rewrites, which is basically a chase of FDs. 


Multi-tuple deletion. Cong et al. [2012] also studied a variant of deletion prop¬ 
agation that aims to remove a group of tuples from the view. Their result s classify 
all conjunctive queries as NP-complete, but recently, Kimelfeld et al. [2013) provided 
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a trichotomy for the class of sj-free CQs that extends the notion of head domination, 
classifying queries into PTIME, fc-approximable in PTIME, and NP-complete. 


2.4. Causal responsibility 

A tuple t is a counterfactual cause for a query if by removing it the query changes from 
true to false. A tuple t is an actual cause if there exists a set T, called the contingency 
set, removing of which makes t a c ounterfactual cause. Determi ning actual causality is 
NP-complete for general formulas [Eiter and Lukasiewicz 2002|, but there are families 
of tractable cases [Eit er and Lu kasiewicz 2006]. Specifically, causality is PTIME for all 
conjunctive queries fMeliou et al. 20107 - Responsibility measures the degree of causal 
contribution of a particular tuple t to the output of a query as a function of the size 


of Halpern and Pearl [2005 

, and 

Chockler and Halpern [20041, and were adapted 

to queries in previous work 

Meliou et al. 2010|. Even though responsibility ( p ) was 


we alter this definition slightly to draw parallels to the problem of resilience. 


Definition 2.6 ( Responsibility). Given query q, we say that (D,t, k ) € RSP (q) if and 
only if D |= q and there is T C D n such that D — T \= q and |r| < k but D — (ru {t}) \f= q. 

In contrast to resilience, the problem of responsibility is defined for a particular 
tuple t in D, and instead of finding a I’ that will leave no witnesses for I) f \= q, 
we want to preserve only witnesses that involve t, so that there is no witness left for 
D - (ru jf l) 1= q- This di fference, while subtle, is significant, and can lead to different 
results. In |Example 1.2[ the resilience of query q' has size 1 and contains tuple t\. 
However, the solution to the responsibility problem depends on the chosen tuple: the 
contingency set of si has size 2, and this size can be made arbitrarily bigger by adding 
more tuples in S with attribute W = 7. Furthermore, we show that the problems differ 
in terms of their complexity. 

For completeness, we briefly recall the notions of reduction and equivalence in com¬ 
plexity theory: 

Definition 2.7 (Reduction (<J and Equivalence (=)). For two decision problems, 
S,T C {0,1}*, we say that S is reducible to T (S' < T) if there is an easy to compute 
reduction f : {0,1}* —> {0,1}* such that 

Vw e {0,1}* (w & S ^ f(w) G T) . 

The idea is that the complexity of S is less than or equal to the complexity of T 
because any membership question for S (i.e., whether w G S) can be easily translated 
into an equivalent question for T, (i.e., whether f(w) G T). “Easy to compute” can be 
taken as expressible in first-order logicj^] We say that two problems have equivalent 
complexity (S = T) iff they are inter-reducible, i.e., S <T and T < S. 


The problem of calculating resilience can always be reduced to the problem of calcu¬ 
lating responsibility. 

LEMMA 2.8 (RES < RSP). For any query q, RES(g) < RSP(q), i.e., there is a reduc¬ 
tion from RES(q) to RSP(g). Thus, if RES(g) is hard (i.e., NP -complete) then so is RSP(g). 
Equivalently, if RSP (q) is easy (i.e., PTIME] then so is RES (q). 


’All reductions in this paper are first-order, i.e., when we write S < T we mean S <f 0 T. First-order reduc¬ 
tions are natural for the relational database setting and they are more restrictive than logspace reductions, 
which in turn are more restrictive than polynomial-time reductions (S <f 0 T => S <i og T => S < p T) 
|Immerman 1999] , 
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PROOF. Let q\— 3x\,... ,x s Ai(zi) A ••• A A r (z r ). The reduction from RES(g) to 
RSP(g) is as follows: given (D,k), we map it to (D', t 0 ,k) where D 1 consists of 
the database D together with unique new values o\.... a s and the new tuples 
Ai(zi[a/x]),.... A r (z r [a/x]). In other words, we enter a completely new witness a for 
q that has no values in common with the domain of D. Let t 0 = Ai(zi[a/x]), i.e., the 
tuple of these new values from atom A 1 . It follows that the size of the minimal contin¬ 
gency set for q in D is the same as the size of the minimal contingency set for q and t 0 
in D'. Thus, as desired, (D, k) £ RES (q) (D', t 0 , k) £ RSP(g). □ 


Later we will see a query , g rats , for which RES(q ra t s ) £ PTIME (Cor. 3.221 but RSP(V/ rats ) 
is NP-complete (|Prop. 5. 1|>. Thus (assuming P ^ NP), RSP(q) is sometimes strictly 
harder than RESfgJ! 


3. COMPLEXITY OF RESILIENCE 

In this section we study the data complexity of resilience. We prove that the complex¬ 
ity of resilience of a q uery q can be ex act ly character ized via a natural property of its 
dual hypergraph Ti{q) ((Definition 3.1|>. In|Section 3.1( we begin by showing that the re¬ 
silience problem for two basic queries, the triangle query (q A ) and the tripod query (g T ) 
are both NP-complet e. We then gen eralize these queries to a feature of hypergraphs 
that we call a triad ( (Definition 3.6| l, which is a set of 3 atoms that are connected in 
a special way in We then prove that if 'H(q) contains a triad, t hen RES(g) is NP- 
complete, i.e., determining resilience is hard. Conversely, we show in Section 3.2 that 
if 'H(q) does not contain any triad, then RES(g) £ PTIME. We prove this by showing 
how to transform a triad-free sj-free CQ into a linear query q' of equivalent complexity. 
The resilience of linear queries can be computed efficiently in polynomial time using a 
reduction to network flow as shown in previous work [Meliou et al. 2010J. The desired 
dichotomy theorem for the resilience of sj-free CQ thus follows (|Theorem 3.24[). 


3.1. Triads make resilience hard 

We will define triples of atoms called triads and then prove that if the dual hypergraph 
of a query q contains a triad, then the resilience problem RES(g) is NP-complete. 

We first define the (dual) hypergraph 7f(q) of query q. The hypergraph of a query q 
is usually defined with its vertices being the variables of q and the hyperedges being 
the atoms | |Abiteboul et al. 1995) . In this paper we use only the dual hypergraph: 


Definition 3.1 {Dual Hypergraph 7i(q )). Let q:~ A\, ..., A rn be an sj-free CQ. Its 
dual hypergraph H{q) has vertex set V = {Ai,... ,A m }. Each variable Xi £ var(q) de¬ 
termines the hyperedge consisting of all those atoms in which x, occurs: e, = { A\ x, £ 

varCAj)}. 


For exampl e, Fig. 2 shows the dual hypergraphs of four important queries defined in 
(Example 3.2| In this paper we only consider dual hypergraphs, so we use the shorter 
term “hypergraph” from now on. In fact we will think of a query and its hypergraph 
as one and the same thing. Furthermore, when we discuss vertices, edges and paths, 
we are referring to those objects in the hypergraph of the query under considera¬ 
tion. Thus, a vertex is an atom, an edge is a variable, and a path is an alternating 
sequence of vertices and edges, A ± ,xi, A 2 ,x 2 ,... ,A n _i,x n _i, A n , such that for all i, 
Xi £ var(Aj) n var(A, ;+ i), i.e., the hyperedge joins vertices A, ; and A i+1 . We explic¬ 
itly list the hyperedges in the path, because more than one hyperedge may join the 
same pair of vertices. Furthermore, since disconnected components of a query have no 
effect on each other, each of several disconnected components can be considered inde¬ 
pendently. We will thus assume throughout that all queries are connected. Similarly, 
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(c) Brats query <j b rat s 



Fig. 2: 

of (?a; |A,B,C) is a triad of <jt- 


Example 3.2 The hypergraphs of queries q/_\, q T ats, ?brats, 9 t- {R,S,T} is a triad 


without loss of generality, we assume no query contains two atoms with exactly the 
same set of variables^ 


Example 3.2 [Important queries). 


Before we precisely define what a t riad is, we 

for 


identify two hard queries, and two related queries, t/rats,9brats (see Fig. 2 

drawings of their hypergraphs). 


<?a R(x, y), S(y, z),T(z,x) (Triangle) 

grata A[x), R[x, y), S[y, z), T(z, x) (Rats) 

gbrats A(x),R{x,y),B(y),S(y,z),T(z,x) (Brats) 

g T A[x), B[y),C(z),W[x,y,z) (Tripod) 


We now prove that q& and r/ T are both hard, i.e., their resilience problems are NP- 
complete. This will lead us to the definition of a triad: the hypergraph property that 
implies hardness. Later we will see that tforats is easy for both resilience and responsi¬ 
bility. However, counter to our initial intuition, q rats is easy for resilience but hard for 
responsibility. 


Proposition 3.3 (Triangle q A is hard). RES(q A ) and RSP(q A ) are NP -complete. 


Proof. We reduce 3SAT to RES(q 


A 


and thus so is RSP((/a) by Lemma 2.8 
vi,... ,v n and m clauses Co, 

(£>,/,, A:,/, ) where D, P is a database satisfying q&, and 


_It will then follow that RES ((/a) is NP-complete, 
Let if be a 3CNF formula with 


n variables 

C m - 1 . Our reduction will map any such if to a pair 


if g 3SAT <^> 


(A/,, hi/,) G RES(q) 


6 If two atoms A, B appear in q with the identical set of variables, we can replace A by A n B and delete B. 
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Fig. 3: A six-node segment of the gadget Gi in the hardness proof for q&: A mini¬ 
mum contingency set chooses either all the solid lines marked i)j, or all the solid lines 
marked ui. The dotted lines are sad because each of them is only part of one single 
RGB triangle, thus they are never chosen. 



Fig. 4: Each gadget Gi in the hardness proof for g A is a cycle containing 2m six-node 
segments and a total of 12m RGB triangles. They can all be eliminated by removing the 
6 m edges marked r,; or the 6m edges marked vj. The even numbered segments are sad 
because they are never used for connecting different gadgets (corresponding to clauses 
that use several variables); they only separate the odd ones, thus preventing spurious 
triangles. 


In our construction, if ip £ 3SAT, then the size of each minimum contingency set for 
qA in will be ky, = (mm, whereas if ^ ^ 3SAT, then the size of all contingency sets 
for < 7 a in Dy, will be greater than ky,. 

Notice that Dy, |= g A iff it contains three tuples R(a , b), S(b. c), T(c, a) that together 
form a witness. We visualize R(a , b) as a red edge, S(b, c) as a green edge and T(c, a) as 
a blue edge. In other words, each witness (a, b, c) for D y, \= q/\ forms an RGB triangle. 


Figures 3] [4] and [5] corresponds to the 
T.) The job of a contingency set for g A 


(Notice that the edge direction a —> b drawn in 
variable order in R, and analogously for S and 7\J 
is to remove all RGB triangles. 

Dy, contains one circular gadget Gi for each variable v t . The circle c onsists of 1 2m 
solid edges, half of them marked Vi and the other half marked vj (see |Figures ~3 4j. 
Note that there are 12m RGB triangles and they can be minimally broken by choosing 
the 6m Vi edges or the 6m v> edges. Any other way would require more edges removed. 
Thus, each minimum contingency set for I),j. corresponds to a truth assignment to the 
variables of tb. And there will be a minimum contingency set of size ky, = (rmn iff 
lb e 3SAT. 

We complete the construction of Dy, by adding one RGB triangle for each clause Cj. 
For example, suppose Cj = v { kvyV v :i . The RGB triangle we add con sists of a red edge 
marked v\, a green edge marked v 2 and a blue edge marked v 3 (see |Fig. 5). Note that 
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Fig. 5: For clause Cj = (v \ V v 2 V v 3 ) in the hardness proof for q&, we identify vertices 
b \ j+1 G Gi with bl j+1 G G 2 ; c\ j+1 G G 2 with c| i+1 G G 3 and al j+2 G G 3 with (4 i+1 G G 3 . 

This RGB triangle will be deleted iff the chosen variable assignment satisfies Cj. 


if the chosen assignment satisfies Cj, then all v\ edges are removed, or all v 2 edges are 
removed, or all v 3 edges are removed. Thus the C :j triangle is automatically removed. 

How do we create G/s RGB triangle? Remember that we have chosen G, to contain 2 
segments for each clause. We use segment 2/+1 of Gi to produce the Vi or Wt used in G/s 
triangle. The even numbered segments are not used: t hey serve as buffers to prevent 
spurious RGB triangles from being created. In |Fig. 4[ we mark these even segments 

■ used. 


with frowns: they are sad because they are never i 


More precisely, the red tq-edge from G 3 is (( 


(b[ 


'4j+l 


h 1 

> °4j+l 


I, the green c 2 -edge from G 2 is 


4j+l! u 4j+l 


), and the blue u 3 -edge from C 3 is (c; 


4j+l> a ij+2 


) (see Fig. 5 . 


Now to make this an RGB triangle in D,i ., we identify the two a-vertices, the two 
b vertices and the two c vertices. In other words, Gi’s a-vertex a\ - +1 is equal to C 3 s 
a-vertex a| . +2 , i.e., they are the same element of the domain of Dj,. We have thus 
constructed G/s RGB triangle (see|Fig. 5|>. 

The key idea is that these identifications can only create this single new RGB tri¬ 
angle because there is no other way to get back to Gi from G 2 in two steps. All other 
identifications involve different segments and so are at least six steps away. Recall 
that this is the reason why the even-numbered segments in the G/s are not used : this 
ensures that no spurious RGB triangles are created. Thus, as desired, Eq. 3.4 holds 
and we have reduced 3SAT to RES(r/ A ). □ 

We next show that the tripod query g T is also hard. We do this by reducing the 
triangle to the tripod. Understanding this reduction is useful for understanding the 
proof of our main result. 

Proposition 3.5 (Tripod q T is hard). RES(g T ) and rsp (q T ) are NP -complete. 

PROO F. First observe that in q T , var(A) is a subset of var(IU). We say that A domi¬ 
nates W ( [Definition 3.7| l. It thus follows that when computing the resilience of </t, a tu¬ 
ple W (a, b , c) is never needed in a minimum contingency set because it could always be 
replaced at least as efficiently by the tuple A(a). It follows that we may assu me that W 
is exogenous, i.e., RES(< 7 t) = RES(g^) where q^ A(x),B(y),C(z), y, z) (Prop. 3.8 1 
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We now reduce RES (q a) to RES(q^ ). It will then follow that RES(< 7 t) is NP-complete, 
and thus so is RSP(<j t ) by|Lemma 2.8| Let (D, k) be an instance of RES(( 7 a) • We construct 
an instance (D 1 , k) of RES (q^) by constructing relations A, B. C as copies of R, S, T from 
D. Define D' = (A, B, C , W x ) as follows: 


A = {(ab) | 

B = {(be) | S(b,c) G D) 

C = {(ca) | T(c, a) £ D{ 

W K = {((ab), (be), (ca)) | a, b, c £ dom(D)} 


Here, dom(D) is the set of domain elements of D and (ab) stands for a new unique 
domain value resulting from the concatenation of domain values a and b. 

Observe that there is a 1:1 correspondence between the witnesses of D \= <7 a and 
the witnesses of D' |= q’ T . For example, (a, b, c) is a witness that I) \~ (/a iff tuples 
R(a,b),S(b,c),T(c,a) occur in D. This holds iff ((ab), (bc),(ca )) is a witness that D' |= 
q' T , i.e., the tuples A((ab)),B((bc)),C((ca)),W((ab),(bc),(ca)) occur in D'. Thus, every 
contingency set for in I) corresponds to a contingency set of the same size for in 
D'. It follows that ( D, k ) G RES(gA) (D', k) G RES(g^). □ 

While (7 a and qj? appear to be very different, they share a key common structural 
property, which we define next. 


Definition 3.6 (triad). A triad is a set of three endogenous atoms, T = {Fo, Si, S 2 } 
such that for every pair i,j, there is a path from Si to Sj that uses no variable occurring 
in the other atom of T. 


O bserve that atoms R, S, T form a triad in (/a and atoms A, B, C form a triad in c/t 
( see|Fig. 2). For example, there is a path from R to S in (/a (across hyperedge y) that 
uses only variables (here y) that are not contained in the other atom (here y var(T)). 

A triad is composed of endogenous atoms. Some atoms such as W in qt are given 
as endogenous, but are not needed in contingency sets. We will simplify the query by 
making all such atoms exogenous. 


Definition 3.7 ( Domination ). If a query q has endogenous atoms A, B such that 
var(A) c var (B), then we say that A dominates B^ 


We already saw an example in Prop. 3.5 in q t, each of the atoms A, B, C dominates 


W. The following proposition was proved in [Meliou et al. 20101. Unfortunately how¬ 
ever, it was claimed to hold with respect to responsflnlity rather than resilience. As we 
will see later, this proposition fails for responsibility bec ause the t uple we are comput¬ 
ing the responsibility of may interfere with domination (|Prop. 5.1). 


Proposition 3.8 (Domination for resilience). Let q be an sj-free CQ and q' 
the query resulting from labeling some dominated atoms as exogenous. Then RES (I/) = 
RES (q'). 


PROOF. Let r be a minimum contingency set of q in D. Suppose that atom A dom¬ 
inates atom B but there is some tuple B(t) € P. Let p be the projection of t onto 
var (A). Then we can replace B( t) by A(p) and we remove at least as many witnesses 
that D |= q. It follows, as desired, that the complexity of RES (< 7 ) is unchanged if B is 
exogenous, i.e., RES(q) = RES(g'). □ 


' Recall that we never have the case of var(A) = var(B). 
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When studying resilience, we follow the convention that all dominated atoms are 
exogenous. For example, A dominates R and S in the query q rats , and B dominates R 
and S in the query % rats . We thus transform the queries so that the dominated atoms 
are exogenous. Exogenous atoms have the superscript “x”. 


gr-at^s A(x),R K (x,y)iS{]jiZ),T*{z,x) 

Qbr-atfs* M. x ),FC i { x ,y)iB(y),S x (y,z),T x (z,x) 


By Prop. 3.8| RES((/rats) — RES(c/a at' s) ond RES(^rats) — RES( q\,r' at ,: s ,: )• 

We now prove our first main result. 

LEMMA 3.10 (Triads MAKE RES(g) HARD). Let q be an sj-free CQ where all domi¬ 
nated atoms are exogenous. If q has a triad, then RESfc/) is HP-complete. 

PROOF. Let q be a query with triad T = {.S'q. Sj, S 2 }■ We build a reduction from 
RES (^a) to RES(g). Given any I) that satisfies c/a we will produce a database D' that 
satisfies q such that for all k: 


( D, k) G RES (c/a) (Df k) G RES(g) 


<[37TT) 


We will assume that no variable is shared by all three elements of T (we can ignore 
any such variable by setting it to a constant). Our proof splits into two cases: 

Case 1: var(ff 0 ),var(S i), var(S 2 ) are pairwise disjoint: Our reduction is similar to the 


reduction from c/a to c/ T (Prop. 3.5 . 

We first define the triad relations in D': 


5 0 = U(ab),...,(ab)) \ R(a, b) G D} 

51 = U(bc),...,(bc)) | S(b,c) G D) < [302} 

5 2 = {((ca),...,(ca)) | T(c,a) £ D}. 

Thus, each tuple of, for example, S 0 consists of identical entries with value (alt) for 
each pair R(a, b) G D. Thus, S 0 , So,S 2 mirror R, S, T, respectively. 

To define all the relations corresponding to the other atoms Aj of D', we first parti¬ 
tion the variables of q into 4 disjoint sets: var(g) = var(S'o)Uvar(S , i)Uvar(S , 2)UV3. Now 
for each atom A if arrange its variables in these four groups. Then define the relation 
R' t of D' corresponding to atom A, as follows 


R'i = {{{ab);(bc)){ca)-,{abc)) \ D [= q A (a,b,c)} 


d37T3l > 


For example, all the variables v g var(5' 0 ) are assigned the value (ab) and all the 
variables v G V 3 are assigned (abc). 

By the definition of triad, there is a path from S 0 to .S) not using any edges (variables) 
from va.r(S 2 ). Thus, any witness of !)' |= q that includes occurrences of (ab) and (b'd) 
must have b = b'. 

Similarly, a path from Sj to S 2 guarantees that c is preserved and a path from S 2 
to S 0 guarantees that a is preserved. It follows that the wit nesses that O' |= c/ are 
essentially identical to the witnesses that D \= c/aO'E- y, z) (see Fig. 6) 8 

from Sn, Si or S 2 . 


Furthermore, any minimum contingency set only needs tuples 


Thus the sizes of minimum contingency sets are preserved, i.e., Eq. 3.11 holds, as 
desired. Thus RES(q) is NP-complete. 


8 More precisely, if (a, b, c) is a witness that D |= (?a , then ((ab), (be), (ca), (abc), a, b, c) is a witness that 
D' (= q, with the variables partitioned according to Eq. 3.14 and these are the only possible such witnesses. 
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Case 2: var (Si) IT var (Sj) ^ 0 for some i / j: We generalize the construction from 
Case 1 as follows. Partition var(Sj) into those unshared, those shared with Si- 1 , and 
those shared with S i+1 (addition here is mod 3). 

We then assign the relations of the triad as follows: 

Sq = {((ab)-,a;b) | R(a,b) £ D} 

5 1 = {{(bc)-,b-,c) | S(b lC )£D} 

52 = {((ca);c;o) | T(c,a) £ D } 

Since none of the Si’s is dominated, both a and b occur in each tuple of So, both of b and 
c in each tuple of Si and both of c and a in each tuple of S 2 . Thus, as in Case 1, S 0 , ,Sj. S 2 
capture R, S, T, respectively. The key ideas is now that we partition all the variables 
var (q) into 7 sets according to their respective appearance in each of the 3 tables. For 
each assignment of x , y, z to values a , b, c in D, we will then make assignments to the 
variables according to their partition: 


set name 

variable partition 

assignment 

F 0 

var(S'o) — (var(Si) U var(S' 2 )) 

Jab) 

Vi 

var (Si) — (var(So) U var(S 2 )) 

(be) 

V 2 

var(S 2 ) — (var(So) U var(Si)) 

(ea) 

V 3 

var (q) — (var(So) U var(Si) U var(S 2 )) 

(abc) 

v 4 

var(S 2 ) IT var(So) 

a 

V 5 

var(So) IT var(Si) 

b 

V 6 

var(Si) IT var(S 2 ) 

c 


We then define the relations in D' corresponding to each of the other atoms A of q to 
be the following set of tuples, where the only difference is which of the 7 members of 
the partition of variables occurs in var (A). 


|((a&); (6c); (ca); (a6c);a;6;c) | D\=q A (a,b,c)} 


d37l5l > 


By the definition of a triad, there is a path from Sp to Si not using any edges (vari¬ 
ables) from S 2 . Thus, “6” is always present (see Eq. 3.14 1. Thus, any witness including 
occurrences of some of ( ab),b', (b"c) must have 6 = U = b". Thus, as in Case 1, the 
witnesses of D' \= q are essen tially i dentical to the witnesses of D |= q A and we have 
reduced RES(q A ) to RES(<7) (seelFig. 61). □ 


3.2. Polynomial algorithm for linear queries 

We just showed that resilience for queries with triads is NP-complete. Next we will 
prove a strong converse: resilience for triad-free queries is in PTIME. We start by defin¬ 
ing a class of queries for which resilience is known to be in PTIME. 

Definition 3.16 (Linear Query). A query q is linear if its atoms may be arranged in 
a linear order such that each variable occurs in a contiguous sequence of atoms. 


Example 3.17 (Linear Query). Geometrically, a query is linear if all of the ver¬ 
tices of its hypergraph can be drawn along a straight line and all of its hyper¬ 
edges can be drawn as con vex re gions. For example, the following query is linear: 

qA(x), R(x, y),S(y, z) (see |Fig. 1) . 


The r esponsibility of linear queries is known to be in PTIME and thus by Theo- 
rem 2.8[ resilience of linear queries is in PTIME as well. 

FACT 3.18 (Linear QUERIES IN PTIME [Meliou ET AL. 20101). For any linear 
sj-free CQ q, RSP(q) (and thus also RES (q)) are in PTIME. 
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Fig. 6: Reduction from RES (g A ) to RES(g) when q contains a triad {.S' 0 . Sj, S 2 } in the 


proof of Lemma 3.10 



Fig. 7: Example 3.17 Linear query qA(x), R(x, y), S(y, z). 


Proof. We give the proof for completen ess and because we will need an extension 
of the proof for a later result fLemma 5.16| l. 

Let q Ai(zi)A- • • AA r (z r ) be a linear query, arranged in its linear ordering. We first 
show that RES(q) e PTIME. Let D \= q. We construct a network N = N(q, D) as follows. 
N is an (r+l)-partite graph consisting of vertices V = {s} U P\ U P 2 U • • • U P r ~i U { / }. 
Each edge of N has weight 1 and corresponds to exactly one tuple A, (a) e D. P, is the 
projection onto var(A) H var(A i+1 ) of Af ixi Af +1 . The edge corresponding to A(a) is 
(^var(A I _ 1 )nvar(A I )(a),7T var(A . )nvar(Ai+l) (a)). However s is th e starting point of all the A ± 
edges, and t is the endpoint of all the A r edges (see |Fig. 8) . 

With this construction, a cut in N(q, D) is exactly a contingency set for (q, D) and 
thus a min cut is exactly a minimum contingency set. Thus we have reduced RES(g) to 
network flow. 

A similar but more complicated construction shows how to use network flow to com¬ 
pute the responsibility of tuple d e D for the linear query q. We construct the same 
network N(q, D) but now we modify some of the edge weights. We want to compute the 
minimum size of a contingency set V such that I) — T |= <j but I) (FUd) [A (/■ Consider 
all the witnesses w that D |= q such that w extends d. For any contingency set I for d, 
at least one such w must witness D — T \= q. Thus, T must be disjoint from w. Observe 
that a contingency set for d which is disjoint from w is a cut of N(q, D) which removes 
d but leaves the rest of w. The minimum weight of such a contingency set is exactly the 
min cut of N vr (q, D) which is formed from N(q, D) by changing the weight of d to 0 (as 
it is removed at no cost) and changing the weights of all the edges in w — d to oo: they 
cannot be removed. Thus, the responsibility of d is the minimum over all witnesses w 
exte nding d of the m in cut o f N w (q,D). We illustrate this construction for the query 


from |Example 3.17 in Fig. 8 
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Fig. 8: Network flow in the proof of Fact 3.18 illustrated for query 


qA(x), R(x, y),S(y, z) from|Fig. 7|and database D = {^4,1?, S}, where A = {ai, a 2 , a 3 }, 
R = {(ai,6i), (ai,6 2 ),(a 2 ,6 2 ), (03,63)}, S = {(6i, ci), (&i, c 2 ), (b 2 , c 2 ), {b 3 , c 3 )}. The draw¬ 
ing on the left is N(q, D), the result of the reduction from RES(<7, D) to network flow. 
The drawing on the right is N vr (q,D) where we are computing the responsibility of 

d = R(a 1 ,b 2 ) and w = (ai,& 2 ,c 2 ). 


Thus we have shown that the complexity of computing RES(q) is at most that of 
network flow. On the other hand, RSP(q) may be computed by computing network flow 
of all the networks N w (q,D). For each fixed q, there are at most 0(n r ) such w. Thus, 
for each q, RSP(g) £ PTIME. Note that for linear queries, the complexity of resilience is 
no more than the complexity of network flow. However, the complexity of resilience is 
in PTIME for each fixed q, but we do not currently have a fixed upper bound on the size 
of the exponent. □ 


If all queries without a triad were linear, then this would complete the dichotomy 
theorem for resilience. While this is not the case, we will show that any triad-free query 
can be transformed into a query of equivalent complexity that is linear. 

Recall that when studying resilience, we make atoms which are dominated, ex¬ 
ogenous ( Prop. 3.8j l. This is done, for example, to the r ats and brats queries, i.e., 
RES(<7rats) = RES(// r ' f jt' g ) and RES(qbrats) = RES a ^ ) (see Eq. 3.9 1 . Neither tyr'aC's nor 
% r ' at : ' s :c is linear. However they can be transformed to linear queries without changing 
their complexity via the following transformation from [Meliou et al. 20101: 


Definition 3.19 ( Dissociation ). Let A x be an exogenous atom in a query q, and v £ 
var (q) a variable that does not occur in A x . Let </ be the same as q except that we add 
v to the arguments A x . This transformation is called dissociation. 


Example 3.20 ( Dissociation ). The queries q r * at * s and Vbr ,: at ,: s' : <|Eq. 3.9|> have no tri¬ 
ads but they are not linear. However, applying certain dissociations, we obtain the 
following linear queries: 


^at“B y, z),S(y, z), T x (x , y, z) 

M x )i RX ( x > V’ z )i B (y)s x ( x > y, z),t x {x, y, z) 

Note also that qf at ^ s and <?{,,.* at * s * have duplicate atoms which we finally delete, without 
affecting their complexity: 


^PatRs ■ A{xf R (x, 2/? 3 d) 

9br-at“s“ A ( x )> RK ( X > z ), B (y) 

The key fact is that dissociation cannot decrease the complexity of resilience or re¬ 
sponsibility. 
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Lemma 3.21 (Dissociation increases complexity [Meliou et al. 2010)). 

If <f is obtained from q through dissociation, then RES(q) < RES(c/). 

PROOF. Let R x ( z) be the atom that has been changed to R x '( z, v). We reduce RES(q) 
to RES(g') by mapping (D,k) to (D',k) where D' is the same as D with the exception 
that we let R x ' = {ft. d) | i? x (t) £ D;d £ dom(D)}. This transformation does not 
change the witness set nor the contingency sets, because, by the way we formed R x ' 
from R x , the conjunct R x \z,v) places the same restriction on I)' that R x ( z) places on 
D. □ 

The other direction does not hold, i.e, dissoci ation may str ictly increase the complex- 


__ that if q can be dissociated 

PTIME. In particular, the above dissociations of q r * at x s 


ity of the resilience of a querj0 It follows from Lemma 3.21 
to a linear query, then RES (q) £ 
and gb r x at x s x prove that RES(g r * at * s ) and RES(r/ br - at v r ) are in PTIME. Thus, since the 
transformations from g rats to g r x a t x s and r/ brats to % r ' a t x s x preserve the complexity of re¬ 
silience, we conclude that RES(r/ ratg ) and RES(<j brats ) are easy. Lat er we will see that, for 
responsibility, RSP(g brats ) £ PTIME but RSP(g rats ) is NP-complete ( Prop. 5. 1) . 


COROLLARY 3.22. RES(< 7 mte ) and RES(®, rats ) are in PTIME. 

Later we will see that it is also true that dissoc iation does no t decrease the complex¬ 
ity of responsibility, but the proof is more subtle ( Lemma 5.15) . 

Now we are ready to show that the RES (q) is easy if q is triad-free. We will show 
that for every triad-free query, we can linearize the endogenous atoms and use some 
dissociations to make the exogenous atoms lit into the same order. 

Lemma 3.23 (Queries without triads are easy). Let q be an sj-free CQ that 
has no triad. Then RES(q) is in PTIME. 

PROOF. Let q be a triad-free query. We prove by induction on the number of endoge¬ 
nous atoms in q that we can transform it into a linear quer y by using dissociations. 
Since dissociat ions canno t decrease complexity ( |Lemma 3.21) and resilience is easy for 
linear queries (|Fact 3.18), it follows that RES(g) is in PTIME. 

Base case: q has fewer than three endogenous atoms. Consider Si, S 2 the endogenous 
atoms of q. Using dissociation, we add all the variables to all the exogenous atoms. 
Thus all the exogenous atoms are identical and we can remove all but one, call it Ef. 
The resulting query, q', is linear with ordering .S) , E x , S 2 . Thus RES (q) £ PTIME. 

Inductive case: assume true for triad-free queries with n endogenous atoms. Let q n+ 1 
be triad-free and have n+1 endogenous atoms. We now describe a way to linearize these 
atoms. For each endogenous atom S it let c, : be the cut of the hypergraph resulting from 
removing all the variables of St, i.e., all the hyperedges that touch S l . These cuts are 


drawn as dotted vertical lines in Fig. 9 


Let Si and S 2 be two endogenous atoms and draw S 2 to the right of Si. Now consider 
a third endogenous atom S 3 . Since q n+ i is connected and has no triads, there is a 
unique i £ {1,2,3} such that the cut e, disconnects the two atoms in {Si, S 2 , S 3 } - {Si}. 

Thus we must place S’, between the other two. In other words, there is exactly one 
place that S 3 can be added to the figure: to the left of Si if c b separates S 3 from S 2 ; in 
between Si and S 2 if c 3 separates Si from S 2 ; or to the right of S 2 if c 2 separates Si 
from S 3 . 

For example, let Si ( x, y) and S 2 (y, z ) be the first two endogenous atoms. Let the third 
be S 3 (z ,'«;) which shares a variable with S 2 . Note that c 3 does not separate Si from S 2 


9 For example, the query lA(x), Wf (x, y), B(y), WJ(y, z), C(z) is linear, but by applying dissociation we 
can transform it to qi- 
















20 



Fig. 9: A walk along the endogenous atoms in the proof of Lemma 3.23 The cut a 
results from removing all the variables (edges) from atom Si. 


and ci does not separate S 2 from S 3 . Since g n+1 has no triad, it must be the case that 
c 2 separates Si from S 3 . Thus, the order in this case must be Sj . S 2 , S 3 . 

Now add the remaining endogenous atoms one at a time. Since q n+ i has no triad, by 
the above observation, there is exactly one place that each next endogenous atom may 
be placed. Finally once all the endogenous atoms have been placed, renumber them so 
left to right they are 5 1; S 2 , ■ ■., S n+1 . 

Define the query q n to be the result of removing all the variables in var(S' n+ i) — 
var(5„) and removing all the atoms in which any of those removed variables occurred. 
In ~ 


Fig. 9 this corresponds to removing everything to the right of c n 


3y our inductive hypothesis, there is a query q' n that is the result of doing some dis¬ 
sociations to q n , and q' n is linear. Furthermore by our observation above, the ordering 
of the endogenous atoms remains Si, S 2 ,..., S n . 

Now, we form q' n+1 by first adding back to q„ all the variables and atoms that we 
removed. Note that we are thus adding back just one endogenous atom, S n+ 1 , together 
with zero or more exogenous atoms, all of which contain some variables in var(,S' n+1 ) — 
var(SVi). Finally, to all these exogenous atoms that we have just added back (if any), 
add all the variables in var(.S' n )Uvar(.S' n+ i), together with any other variables occurring 
in any of these exogenous atoms. Thus all the newly re-added exogenous atoms are 
identical and we can combine them into one, call it, E%. Note that c n still separates Lf 
and S n+ i from the rest of the hypergraph. 

Thus, we have transformed q n+ i to a linear query q' n+1 such that RES(g r , +1 ) < 
RES(q^ +1 ). Thus RES(g„ + i) e PTIME as desired. □ 


3.3. Dichotomy of resilience 


Combining Lemmas 3.10 and 3.23 leads to our first dichotomy result on the complexity 
of resilience: 


Theorem 3.24 (Dichotomy of resilience). Let q be an sj-free CQ and let q' be 
the result of making all dominated atoms exogenous. If q' has a triad, then RES(g) is 
NP -complete, otherwise it is in PTIME. 

Note that it is easy to tell whether q has a triad. Checking whether a given triple of 
atoms is a triad consists of three reachability problems and - is there a path from 5) 
to Sj not using any of the edges in var(Sfc) - and is thus doable in linear time. 

An exhaustive search of all endogenous triples thus provides a PTIME algorithm: 

COROLLARY 3.25. We can check in polynomial time in the size of the query q 
whether RES(g) is NP -complete or PTIME. 


4. FUNCTIONAL DEPENDENCIES 

Functional dependencies (FDs), such as key constraints, restrict the set of allowable 
data instances. In this section, we characterize how these restrictions affect the com- 
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plexity of resilience. We first s how that FDs cannot increase the complexity of the 
resilience of a query ( Prop. 4.T] l. Next we introduce a transfo rmation of queries sug¬ 
gested by a given set of PDs call induced re writes (|Def. 4.4| l. We show that induced 
rewrites preserve the complexity of resilience ( Lemma 4.5| l. _ 

We call a query closed if all possible induced rewrites have been applied ( Def. 4.4| >. 
We conjectured that induced rewrites capture the full power of FDs with respect to 
the complexity of resilience, in other wor ds, the complexit y of the resilience of a closed 
query is unchanged if we remove its FDs ( |Conjecture 4.7 1. 

We pro ve that the c omplexity of resilience for closed queries that have triads is NP- 
complete ( |Lemma 4.8| l. On the other hand, even without it s FDs, we kno w that a closed 
query that has no triads has an easy resilience problem ( Lemma 3.23| l. We thus con¬ 
clude that in the presence of FDs, the dichotomy - still determined by t he presence o r 
absence of triads, but now in the closure of t he query - remains in force (Lemma 3.23 1 
It follows as a corollary that|Conjecture 4?7lholds. 


4.1. FDs can only simplify resilience 

We write RES(< 7 ; <1>) to refer to the resilience problem for query q, restricted to databases 
satisfying the set of FDs $. Note that since we are always considering conjunctive 
queries, any particular FD either holds or does not hold on the whole query, so it is not 
necessary to mention which atom the FD is applied to. 

First we observe that FDs cannot make the resilience problem harder: 


Proposition 4.1 (FDs do not increase complexity). Let q be an sj-free CQ 
and $ a set of functional dependencies. Then RES(g; 4>) < RES(g). 

PROOF. The reduction is the identity function. Note that RES(q; '1') is just the restric¬ 
tion of RES(g) to databases satisfying 4>. Thus, for all databases D that satisfy ( 9 ;$): 
(D, k) £ RES( 5 ; <f>) <=> (D, k) £ RES (q) . □ 

Corollary 4.2 (Triad-free queries are still easy). If q is an sj-free CQ 
that has no triad, and therefore RES (q) is in PTIME, then RES(g; $) is also in PTIME. 


We next show that for some queries, FDs d o in fact r educe the complexity of re¬ 
silience. Recall that the tripod query, q T is hard ( Prop. 3.5 1. However, r/ T becomes poly¬ 
nomial when we add the FD ip = x —> y. 


Proposition 4.3 (FDs make q T easy). 

RES(g T ; {a; — > y }) £ PTIME . 


We will prove |Prop. 4.3] along the way, as we learn about the effect of FDs. Re¬ 
call that the tripod query </ T has the triad {A, B, C} . Notice that the FD x -A y 
“disarms” this triad because A and B are no longer independent. More explic¬ 
itly, once we know x, we also k now y. Thu s RESp/ T : {x —► y}) = RES(r) where 
r:—A'(x,y),B(y),C(z),W x (x,y,z) ( Lemma 4.5) . Furthermore, since B dominates A' 
in r, A! becomes exogenous: r ':— A’*(x, y), B{y)\ C{z),W x (x , y, z). Query r' has no triad 
and thus is easy. 


4.2. Induced rewrites preserve complexity 

We call the transformation (g T ; {x -£ y}) (r; {x -£ y\) an induced rewrite^] Induced 

rewrites are key to understanding the effect of FDs on the complexity of resilience. 


^Transformations of queries called rewrites were defined in IMeliou et al. 2010]. An induced rewrite is a 
rewrite that is induced by an FD. 
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Definition 4.4 (induced rewrite: closed query). Given a set of functional depen¬ 
dencies $ and a query q, we write (</;$) (//: 4) to mean that </ is the result of 

adding the dependent variable u to some relation that contains all the determinant 
variables v for some v —► u e $. We use to indicate zero or more applications of 
If (q; 4>) (q*; $) and no more induced rewrites can be applied to (q*; 4>), then we call 

(q*\ 4>) a closed query and we say that (<?*; $) is the closure of (q; 4>). 


This paper began as an attempt to determine whether the dichotomy for respon¬ 
sibility of sj-free CQs IMeliou et al. 20101 continues to hold in the presence of FDs. 
In studying the effect of FDs, we defined induced rewrites and proved that induced 
rewrites preserve the complexity of responsibility. We conjectured that once we have 
reached a closed query, all the effect of the FDs on the complexity of responsibility has 
been exhausted and thus there is no further change if we delete all the FDs. We were 
able to prove this conjecture for unary FDs, i.e., those of the form v -a u where v is a 
single variable. 

However we had great difficulty proving this conjecture for all FDs. We studied the 
responsibility problem more carefully and found that responsibility is quite delicate. 
In p articular, we discovered an error in Lemma 4.10 of [Meliou et al. 2010], namely 
that |Prop. 3.8| (in the present paper) does not hold for responsibility. 

We identified resilience as a better-behaved notion than responsibility and we char¬ 
acterized the complexity of resilience via triads. Once we had done that, we were able 
to use the notion of triads to prove our conjecture about closed queries and thus prove 
the dichotomy theorem for resilience in the presence of arbitrary FDs. We give that 
proof shortly. 

With our impro ved insight from resilience, we went back and proved the dichotomy 
for responsi bility (Theorem 5.18| l and finally showed that it holds as well in the pres¬ 
ence of FDs ( [Theorem 5.20| l. 

We first show that induced rewrites preserve the complexity of resilience. 


Lemma 4.5 (Induced rewrites preserve complexity). Let qbe a query, 4> a 
set of functional dependencies, and q' the result of an induced rewrite, i.e., (q; 4>) 

{q '\$). Then RES(g'; $) = RES (q; <f>). 

PROOF. Let the change from q to q' be the transformation of the atom B to the new 
atom B’ caused by adding variable u to B where (v —» u) £ $ and v C var (B). 

(a) RES(<7 / ; 4>) < RES(g; $): Suppose we are given ( D',k ) where D' satisfies 4>. Let D 
be the result of projecting out the u entry from B'. Note that D still satisfies $. 
Furthermore, the set of witnesses that D \= q is identical to the set of witnesses 
that D' |= q' and the sizes of all minimum contingency sets are unchanged. This is 
because the effect of the tuple B( t) in a contingency set in D is identical to the effect 
of the tuple in the corresponding contingency set in D', where t' is the result 
of adding to t the unique ^-attribute which is determined by the v-attributes of t. 
Thus the map (D r , k) ha ( D , k) is a reduction of RES(</; 4>) to RES(g; 4>). 

(b) RES(q; 4>) < RES(</; $). We are given (.D, k) where D satisfies $. Let B' be the set 
of tuples resulting from adding to each tuple t from B, the uniquely determined 
u-attribute, c. In symbols, B' = 

{(t, c) | B{ t) £ D A 3s £ D (7r v (s) = 7r v (t) A c = 7r u (s))} 

For the same reason as above, the witnesses of q' in !)' are the same as the witnesses 
of q in D and the sizes of all minimum contingency sets are unchanged. Thus the 
map (D, k) ha (D\ k) is a reduction of RES(g; 4>) to RES(</; $). □ 
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It follows immediately that applying any set of induced rewrites preserves the com¬ 
plexity of resilience: 

COROLLARY 4.6. If (q- 4>) (q 4>), then RES(g'; 4>) = RES(g; 4>). 


4.3. For closed queries, FDs are superfluous 

Recall that our current goal is to determine whether the dichotomy of the complexity 
of resilience remains true in the presence of FDs. The following is a natural conjecture 
which would given an affirmative answer to this question. 


Conjecture 4.7 (Induced rewrites suffice). Let (g*;4>) be a closed query, 
i.e., it is closed under induced rewrites. Then RES(g*; 4>) = RES(g*). 


It is fairly easy to see that Conjecture|4.7|holds when all the FDs in <l» are unary, i.e., 


of the form 


v —► u, 


with v a single variable. However we were stumped about how to 


prove this for general FDs. This lead to our more careful analysis of the complexity of 
responsibility, our defi nition of resilie nce, and our characterization of the complexity 
of resilience via triads ( |Theorem 3.24 1. Now we will use that analysis to prove that the 
complexity of a closed q uery is NP-complete if it contains a triad, and in PTIME oth¬ 
erwise. Thus Conjecture |4.7| is true and the dichotomy for the complexity of resilience 
remains true in the presence of FDs. 


Lemma 4.8 (Closed queries with triads are hard). Let (q*; 4>) be a closed sj- 
free CQ all of whose dominated atoms are exogenous. If q* has a triad, then RES(g*; 4>) 
is NP -complete. 


PROOF. Le t (q*: 4>) be as in the statement of the lemma. Recall that we proved in 
Lemma 3.10 that RES((?a) < RES(g*) and thus RES(g*) is NP-complete. Let / be the 
reduction we produced from RES(f/ A j to RES(g*). We will now show that if f{D,k) = 
(£>', k') then !)' \= <l». It will then follow that / is a reduction from RES(r/ A ) to RES (q*; $). 
Thus RES(< 7 *; $) is NP-complete as claimed. 

To see why D' |= 4>, we will recall the defi nition of the r eduction in the proof of 


Lemma 3.10 But first, we will examine how c/ A (|Example 3.2 1 itself is affected by FDs. 

In particular, let 4 > 0 be any set of FDs for which (qA,®o) is closed under induced 
rewrites. Notice that since qa is closed, there can be no nontrivial unary FDs such as 
x —► y, (otherwise, T(z,x) would have been replaced by T'(z,x,y)) nor any nontrivial 
binary FDs such as xy z (otherwise R(x, y) would have been replaced by R'(x, y, z)). 
In fact, $ 0 has no nontrivial FDs, i.e., 4 > 0 = 0. 

Now recall the reduction from RES(V/a) to RES(g*) in the proof oflLemma 3.101 What 
that proof did was to embed q& into q*. Using the triad of q*, 7 = {S 0 , Si, S 2 }, we 
partitioned the variables of q* into 7 sets, and for each assignment of x. y, z to value s 
a, b, c G dom(D), we made assignments according to that partition (see|Equation 3.14|>. 

The net effect, is that just as for q&, since (q; $) is closed, it must be the case that 
D 1 |= 4 ). In particular, suppose that <l> contains t he FD, u —» v. F irst suppose that u is 
contained in one of the 7 sets of the partition (see|Equation 3.14|>. Then, since (g*; $) is 
closed, v must be in the same set and thus it has exactly the same value as each of the 
variables in u. If u has a variable from V 3 (var(g) —(var(S'o)Uvar( 5 'i)Uvar( 5 , 2 ))) then its 
value is (abc) so it determines all other variables. Similarly, if u has variables from two 
of V 0 , V-[, V2 then it again determines all three values. Suppose u does not determ ine 
all three values, e.g., say it does not determine c. Then, looking at Equation 3.14[ we 
see that all the variables of u are from V 0 , V 4 or V 5 , i.e., they are all from varfS'o). But 
then since (q*; 4>) is closed, v must be in var(S'o) as well, and thus it is determined by a 
and b. 
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Thus, we have shown that the reduction / is also a reduction from RES(gA) to 
RES(q*, 4>) and thus the latter problem is NP-complete. □ 


4.4. Dichotomy of resilience with FDs 

Recall that FDs cannot i ncrease t he complexity of resilience and thus if q has no triad, 
then RES (7/: <I>) g PTIME (|Cor. 4.2|>. Thus, we have succeeded in proving the dichotomy 
for resilience in the presence of FDs: 


THEOREM 4.9 (FD Dichotomy). Let (■ q ; $) be an sj-free CQ with functional depen¬ 
dencies. Let (q*, $) be its closure under induced rewrites, and such that all dominated 
atoms of q* are exogenous. If q* has a triad then RES(g; $) is NP -complete. Otherwise, 
RES(q;4>) G PTIME. 

Note that we have thus also proved [Conjecture 4.7| 

Corollary 4.10 (Induced rewrites suffice). Let ( q ; 4>) be an sj-free CQ with 
functional dependencies, and let q* be the closure of q under induced rewrites. Then, 
RES(g; 4>) = RES(g*;4>) = RES(q*). 


5. COMPLEXITY OF RESPONSIBILITY 

We now develop and prove the analogous characterizations of the complexity of respon¬ 
sibility. As we will see, responsibility is a bit more delicate than resilience, but in the 
end the final theorems are similar. 

We first concentrate on t he difference b etw een resil ience and responsibility. Recall 
the queries q rats and f/ r ' at ' s ([Example 3.2| and |Eq. 3.9|). We saw earlier that RES(// ra ts) 
is in PTIME dCor. 3.22[ l. The reason is that atom A dominates R and T a nd thus th e 
complexity of RES(g rats ) is unchanged when we make R and T exogen ous ([Prop. 3.8 1 , 
i.e., RES((/ ra ts) = RES(r/ r ' a t ,: s)- Obviously c/ r 'at'.s is triad-free. Thus, by 'Theorem 4.2 ' 
RES(r/ r ' a t ,: s) and RES(q rats ) are in PTIME. We now show, however, that RSP(r/ rats 
complete. 


is 


TJF- 


PROPOSITION 5.1 ((?r A ts IS HARD FOR RSP). RSP(q rats ) is NP -complete. 


PROOF. We reduce 3SAT to RSPb/ rats ). Let ip be a 3-CNF formula with variables 
vi,...,v n and clauses C \,..., C m . The reduction will map ip to f[ip) = (D , s 0 , fc) with 
s 0 = S(b Q , c 0 ), where we will construct D = (A, R, S, T) to have a contingency set for s 0 
of size k iff ip e 3SAT (we explain the choice of value k later in the proof). We let a 0 be 
the unique element of the domain of D that joins with s 0 . 

In q ra .ts, A dominates R, but when we are building a contingency set P for s 0 , we may 
require some tuples of the form R(a 0 ,b). Note that these cannot be replaced by the 
tuple A(a 0 ), because that would remove the only witness (a 0 , bo, c 0 ) that contains our 
tuple s 0 . This explains why RES(g rats ) € PTIME while RSP(<? rats ) is NP-complete, and it 
is the key idea behind the reduction we now produce. 

For each variable vt occurring in ip, we build the gadget CL as follows: CL, consists 
of 21 values for y and 21 values for z (1 < j < 21 ) where t is a constant to be 
specified later. We include the 21 pairs R(a 0 ,bj ) and the 21 pairs T(cj,a 0 ), 1 < j < 21 . 
(See Fig. 10 where these pairs are drawn as edges from a 0 to each 6 ) and from each 
to a 0 , respectively. Notice that the value a 0 is shown twice for better illust ration.) 


Next, we include all the pairs 1 < j,j' < t. These are drawn in Fig. 10 

complete bipartite graph between the vertex sets {b{, 


, b[} and {< 


•l! ' 


IF 


as a 
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Fig. 10: The q rats variable gadget G f for variable vg. Red, green, and blue lines corre¬ 
spond to tuples from R, S, and T, respectively. Dotted lines will never need to be chosen 
in minimum contingency sets of /(VO- 



Fig. 11: The q ra ts clause gadget corresponding to clause C s = v\ V V v :i and truth 
assignment cn 6 = {(r>i, 1), (v 2 , 1), {vs, 0)}. A(a Sj6 ) must be in the minimum contingency 
set unless the chosen truth assignment is a 6 . 
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Finally we add two matchings of size t which we name the “ve matching” and the “ve 
matching,” respectively: 


ve matching : 
ve matching : 


S(b[,c\ 


t+ 1) 


,s(bj,4 t ) 


Notice that in Fig. 10 the ve matchings are connecting the upper left corner with the 
lower right corner, whereas the vj matchings are connecting the other two corners. 

Any minimum contingency set must remove all of the witnesses from Ge . Such a 
minimum contingency set must remove either all the pairs R(ao, b [),. .. R(ao, b[) or 
all the pairs T(c[,a 0 ),.. .T(cf,a 0 ), i.e., one side or the other of the complete bipartite 
graph. After this, t witnesses remain, either involving the ve matching (if the T(cj, ao)’s 
were chosen), or otherwise the ve matching. Only the .S'-tuples will be useful for the 
clause gadgets, so the optimal choice will be to choose the t S'-tuples marked v t or the t 
S-tuples marked ve . Any optimal minimal contingency set thus corresponds to a truth 
assignment to the boolean variables v\, ..., v n . 

So far, we have described the gadgets G 1} .. ,G n and shown that any minimum con¬ 
tingency set for this part of D corresponds to a truth assignment for the variables 
vi,...,v n . We next introduce the clause gadgets and choose the value k, so that contin¬ 
gency sets for D of size k will correspond exactly to truth assignments that satisfy all 
of the clauses of if). 

We now describe the clause gadgets. Suppose, for example, that C s = v-\ V vj V v :i 
with s G [m]. Then 7 of the eight possible truth assignments to vi,v 2 ,v 3 satisfy C s , 
i.e., all but the assignment o- z (010 in binary). For each of these 7 good assignments: 
at, i G {0,... 7} — {2}, we add an element a S)l to A and we add the tuples to R and 
T so that a Sti participates in three witnesses, each of which shares an S tuple with a 
witness from each of the three variable gadgets that agree with assignment a*. For 
example, assignment a e (110 in binary) makes vi,V 2 true and v 3 false, so a Si6 joins 

with S(&r (s , 6) > C tV( S ,6))- 5 ( 6 r( s ,6)’ C ?+r( s ,6))> and S ( b t+r(s,6 )> C r( S ,6))■• Here r ( S > *) is a Ac¬ 
tion that chooses a unique element of the matching Vj or vj appropriate to assignment 
a* of clause s (see |Fig. TT i. 

The key property of the C s gadget is that, if the chosen truth assignment satisfies G s , 
then we do not need to worry about the a Sji corresponding to the chosen assignment, 
and may choose only 6 a s /s from A for the contingency set. However, if the chosen 
assignment does not satisfy C s , then all 7 of the a Si ’s must be chosen! 

We can let t = 8 m and k = (2 t)n + 6 m = (16/v + 6)?n. Our construction insures that 
(.D , s 0) k) G RSP(g ra ts) iff ijj G 3SAT. □ 


Notice that in the proof of |Prop. 5.1| we showed that is hard to compute the respon¬ 
sibility for a tuple from S in RSP ( g rats ). The complexity of computing the responsibility 
of a tuple can depend on which relation the tuple is chosen from. In the case of q rats , 
responsibility is hard fo r tuples from all relations except for A. 

The proof of Prop. 5.1 shows that domination does not wor k the same way for respon¬ 
sibility as it does for resilience. In particular, the analogy of |Prop. 3.8| (Domination for 
Resilience) does not hold for responsibility. 

We next show that a modified vers ion of domination still works for responsibility. 
Recall the queries gbrats ( |Example 3.2| l and define the query r/ br . ats as follows: 


qbr^t s --A(x),R x (x,y),B(y),S(y,z),T(z,x) . 

Notice that var(A) C var (R) and var (B) C var(R) and that also var(R) C var(A) U 
var (B). 
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PROPOSITION 5.3 (RSP(<7 B rats))- The complexity of responsibility for qbmts is un¬ 
changed if we make R exogenous, i.e., 

RSP (qbrats) = RSP (tlbr^ats) ■ 

Proof. Let D \= % rats and let t be a tuple that participates in a witness that D |= 
%rats- We will show that there is a minimum contingency set I’' for t that contains no 
tuples from R. Let Y be a minimum contingency set for t that contains as few tuples 
from R as possible. Suppose that R{a\,bi) £ T. Let j be a witness that (D — T) |= 
%rats and let a 0 , hi,, c 0 be the projection of j onto components x, y, z, respectively. Thus, 
A(a 0 ), R{a 0 ,b 0 ) and B(b 0 ) are all in D — f. In particular, R(ai,bi) 7 ^ R(a 0 ,b 0 ). Let T' 
be the result of replacing R(a\,bi) by A(ai) if a\ ^ a 0 , and by B{b\) otherwise, in 
which case b\ / h {] . Thus V is still a minimum contingency set for t and it contains 
fewer tuples from R, contradicting the fact that V had the fewest possible such tuples. 
Thus, tuples from R are never needed in any minimum contingency set for t. Thus, as 
claimed, the complexity of RSP(<j brats ) is unchanged when we make R exogenous. □ 


We are now ready to formalize full domination, the version of domination that works 
for responsibility the way that ordinary domination works for resilience. Our first ex¬ 
ample is that in the query % rats , the relation R is fully dominat ed because every vari¬ 


able in var(f?) is “covered” by some other endogenous relation (Prop. 5.3 1 11 Here are 


three more examples, si,s 2 , S 3 where R is fully dominated and one, n, u where it is not. 
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A(x),R(x, y, w), B(y),S(y, z ), T(z, x ) 
A(x),r(x, y , w), Q x (w),B(y), S(y, z),T(z, x) 
A{x),R{x, y , w), Q [w, x), B(y),S(y , z),T{z , x) 
A(x),R(x, y, w), Q {w, z),B(y),S(y, z), T(z, x) 


In a query q, call a variable w € varf ti) solitary if it cannot reach another endogenous 
atom without following one of the edges in var(i?) — {u;}. Note that in each of si, s 2 , S3, 
the variable w is solitary, but w is not solitary in n 4 . 


Definition 5.5 ( Full domination). Let F be an atom of query q. F is fully dominated 
iff for all non-solitary variables y £ var(f’) there is another atom A such that y £ 
var(H) C var (F). 


Observe that relation R is fully dominated in % ra ts, as well as in si, s 2 , s 3 , but not in 
n 4 ( Eq. 5.4 1. On the other hand, R is not fully dominated in <j rats because y is connected 
to S(y, z) and thus not solitary and not covered by any smaller atom. 

We now show that fully dominated atoms may be made exogenous. 


LEMMA 5.6 (Full DOMINATION). Let Fbea fully dominated atom in an sj-free CQ 
q. Let q' be the modified query in which F is made exogenous. Then RSP(q) = RSP(//'). 


PROOF. We have to show that RSP (q) < RSP (q r ) and RSP(q') < RSP(g). Suppose we 
are given (D, S(t) ) and we are interested in the responsibility of tuple S(t). There are 
two cases. In each case, we will show how, given one of k. k', to produce the other, such 
that: 


(D,t, k) £ RSP(g) O (H',t, k') £ RSP(q / ) ( [5771 ) 

Case 1: F ± S: We show that as in the proof of |Prop. 5.3[ there is no need to include 
any tuples from F in a minimum contingency set \ for q in D. As in that proof, we let 


11 Contrast this with the definition of domination {Definition 3.7| which only requires that some subset of 
the variables is covered by another relation. 












28 


j be a witness for (D — r) |= q and suppose that F(f) G F. Thus, j and f must disagree 
on the assignment of at least one variable. 

(a) : Suppose they differ on some non-solitary variable y of F. Let A be the atom 
that covers y and we can replace F(f) by the tuple 7r var ( j4 )(f) of A. Thus, the sizes of 
the min imum co ntingency sets on the two sides are identical and letting k = k! and 
D = D ', Eq. 5.7| holds. 

(b) : Suppose on the contrary that j and f agree on all the non-solitary variables of F. 
Note that since S is endogenous, no non-solitary variable of F can occur in ,S p~^| Thus, 
the only place that j and f disagree is on non-solitary variables of F which do not occur 
in S. Let F(f 0 ) be the tuple of F that agrees with j. Then f and f 0 agree on all variables 
except for solitary variables of F. Thus, since removing S'(t) from D (F {F(f)}) 
removes all witnesses of D \= q that extend f 0 , it must also remove all witnesses that 
extend f, i.e., f is not useful so it does not occur in T. 

Case 2: F = S: In this case, some tuples of F may need to be in T. Let / be the 
solitary variables of F and let W = {f G F | f useful; f / t A 7rj(f) = 7ty(t)}. These 
are the tuples of F which agree with t on all but the solitary variables of F. W must 
be cont ained in every contingency set for (D,t). Thus, we let k = k! + W and F' = 
F — W. Eq. 5.7 holds. (The point of f being useful in the definition of W is that solitary 
variables may occur in some exogenous relations which could already exclude certain 
values, and thus tuples with those values are not useful so they do not need to be in 
the contingency set.) □ 


5.1. Triads and hardness 

Now that we have established that full domination works for responsibility, we proceed 
to prove a complexity dichotomy for responsibility. 

When studying responsibility, we will insist from now on that every fully dominated 
atom is exogenous. For example, q ra ts has no fully dominated atoms, so it is already in 
its normal form and it has a triad, {R, S, T}. Note that we cannot have two elements in 
a triad such that var(S'i) C var(5 , 2 ) because removing var(S 2 ) would isolate Si. Thus 
{f?, S, T} is the unique triad of q ra t s - O n the ot her hand, R is fully dominated in % rats , 
so we transform it to triad-free <y hr ' a ts ( Eq. 5.2[ i. 

We now show that RSP(q) is NP-com plete if q has a triad. Then we will show that 
otherwise RSP(g) G PTIME (|Cor. 5. 17|). The proofs will take the same form as for re¬ 
silience, however the following proof is slightly more subtle than the analogous result 
for resilience. 


LEMMA 5.8 (Triads MAKE RSP(g) HARD). Let q be an sj-free CQ where all fully 
dominated atoms are exogenous. If q has a triad, then RSP (q) is HP-complete. 

PROOF. Depending on which of the following cases the query falls into, we build a 
reduction to RSP(g) from RSP(<7 a), RSP(q rats ) or RSP(<j t ). Let T = {So, Si,S 2 } be a triad in 
query q. 

Case 1: There is no endogenous atom A such that var (A) C var (.S',) n var (Sj), for 
some i ^ j. We will show that RSP(gA) < RSP(g). 

Given D, t, k we must produce D', t', k' such that 

(D,t,k) G RSP(<7a) ^ (D', t',k') G RSP(g) . ( |5T9l > 


12 We are allowing the computation of the responsibility of tuples from exogenous relations just to make the 
proofs simpler. Notice that we never change the relation S whose tuples we are computing the responsibility 
of. Thus, if we must make S exogenous, we do so as the last fully-dominated atom we make exogenous. 
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Fig. 12: Case 3 of the proof of Theorem 5.8 There is a tripod sitting in the hypergraph 
of q. 


Note that we may assume that t = R(a 0 ,b 0 ) for some values a 0 ,b 0 , i.e., that t is 
a tuple from R, becau se we kno w that RSP(qa) is hard no matter which relation we 
choose the tuple from ( Prop. 2.8| l. 

In this case, we construct 77 exactly as we did in |Lemma 3.10 (Cases 1 or 2), and as 
we did there, we let k! = k. The only difference is that we must define t' from t. This is 
easy: recall that t = R(a 0 , b 0 ). We let t' = So((a 0 b 0 ), q 0 , b 0 ), i.e., the corresponding tuple 
of So- Thus, we have exactly simulated <ja in q, so|Eq. 5.9|holds. 

Case 2: There is an endogenous atom A and some i ^ j, such that var (A) C varfS',) n 
var(5j), but only for a unique pair i / j. We show that RSP(q rats ) < RSP(g). Let the pair 
be 0, 2, i.e., var(A) C var(S’o) H var(5 2 )- 

Again, we are given £>, t, k, where t = R(a 0 , b 0 ). We produce D' , t', but now such that, 


(At,*) G RSPfeats) (A, t',k) G RSP(g) . 


( [57Tol > 


We produce D’ and t' exactly as in Case 1, and we again have that all the witnesses 
and min imum contingency sets for r/ rats wrt D. t are preserved for q wrt /!'. t'. Thus 
Eq. 5.10| holds. 

Finally, we are left with, 

Case 3: There are endogenous atoms A, B such that WLOG var(A) C var(F 0 ) n 
var(S 2 ), and var (B) C var(S'o) H var(S'i). 

We know that So is not fully dominated. Thus, there must exist a non-solitary vari¬ 
able w G varfS'o) such that w ^ var (.4) U varf Ik). Since w is not fully dominated, there 
must be an endogenous atom C ^ So such that C is reachable from .S' 0 without using 
edges fr om var ( A) U var (B). Thus we have loc ated a tripod sitting in the hypergraph 
of q (see Fig. 12). It thus follow from Prop. 3.5 that RSP(g) is NP-complete as well. □ 


5.2. The polynomial case 

As we saw in the previous section, the presence of triads in a query makes its respon¬ 
sibility problem NP-complete. In the responsibility setting we require full domination 
to make an atom exogenous. This means that more atoms may remain endogenous, so 
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there can be more triads. The query g rats is an example: for resilience we use domina¬ 
tion and after applying domination, q rats has no triads and thus RES(q rats ) £ PTIME. 
However, if we may only apply full domination, then c/ ratR keeps the triad R, S, T and 
thus RSP(< 7 rats ) is NP-complete. 

We now want to prove the polynomial case for responsibility. Recall that in the proof 
of Lemma 3.23} we showed the following: 


COROLLARY 5.11. Let q be a CQ that has no triad. Then we can transform q, via a 
series of dissociations, to a linear query q'. 

Then, since dissociations cannot make the resilience problem of an sj-free CQ easier 
(|Lemma 3.21|l, it followed that RES(7/) £ PTIME for any such triad-free query, q. 

To prove that for any triad-free, sj-free CQ, q, RSP(g) £ PTIME, it suffices to prove that 
dissociations cannot make the responsibility problem of such queries easier. As we see 
next, there is a surprising complication to this proof, which gives us an unexpected 
bonus result. 


5.3. A generalization of responsibility 

We want to prove that if q 1 is obtained from q through dissociation, then RSP(g) < 
RSP(q'). In the proof of the similar result for resilience we did the following. We let 
f? x (z) be the atom that was changed to f? x '( z,v). We then reduced RES (q) to RES(g') by 
mapping {D,k) to (. D',k ) where I) 1 is the same as D with the exception that we let 
R' = {t ,d | f?(t) £ D\d £ dom(H)}. This transformation does not change the witness 
set nor the contingency sets, because, by the way we formed R' from R, the conjunct 
R'( z, v) places the same restriction on D' that f?(z) places on I). 

This proof goes through fine for responsibility except in one case, namely if the tuple 
t that we are computing the responsibility of belongs to R, the exogenous relation to 
which we have added the new variable, ?[[£) 

When t £ R, we would like to transform it to t' £ R! by appending a value, a,, corre¬ 
sponding to the new variable, v. However, this will change responsibility in an unclear 
way. In particular, the responsibility of t does not correspond to the responsibility of 
t, a for any particular a. It rather corresponds to the responsibility of t, a for all possible 
a’s. 

To solve our problem, we need to generalize the notion of responsibility to include 
wildcards. 

Definition 5.12 ( tuples with wildcards). Let D be a database containing a relation, 
R(x i,..., x c ). Let r = (si,..., s c ) be a tuple such that each .s, : £ dom(H) U {*}, i.e., r 
may have elements in the domain in some coordinates and the wildcard, *, in others. 
We call r a tuple with wildcards. We say that a tuple (cq,..., a c ) £ R matches r iff for 
all i, ai = Si or .s, : = *. When D and R are understood, r represents a set of tuples from 
R, (r) = {a £ R | a matches r}. 

For example, the tuple with wildcard, (a,*), matches all pairs from R whose first 
coordinate is a. We generalize responsibility to allow us to compute the responsibility 
of a set of tuples denoted by a tuple with wildcards: 

Definition 5.13 (RSP*). Let D be a database containing a relation, R, q a query for 
D and r a tuple with wildcards. Then (D, t, k) £ RSP" (q) iff there exists a contingency 
set T of size k such that {D — r) |= q and (D — (r U (r))) q. 


13 The reader may wonder why we might need to compute the responsibility of an exogenous tuple. The 
answer is that the tuple originally might have come from an endogenous relation which we transformed to 
an exogenous one using full domination. 
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Fig. 13: N v , iT (q, D); w = A(ai), R(ai,b 2 ), S(b 2 , c 2 ); r = i?(ai,*). This is an example 
of Network Flow in the proof of|Lemma 5.16|for query q:— A(x), R(x,y), S(y, z) and 
database D = (A,R,S), where A = |ai,a 2 ,a 3 j, R = {(ai, 61), (ai, b 2 ), (a 2 , b 2 ), (a 3 , b 3 )j, 
S = {(61, ci), 0i,c 2 ),(&2,c 2 ), (b 3 ,c 3 )}. 


Since RSP*(<?) is just a generalization of RSP(g) it is immediate that RSP(g) < RSP*(g). 
Thus, RSP*(( 7 ) is NP-complete whenever RSP(q) is: 

COROLLARY 5.14. Let q be an sj-free CQ all of whose fully dominated atoms are 
exogenous. It q has a triad then RSP" (q) is NP -complete. 

From our previous discussion, it now follows that dissociation does not make RSP* (q) 
easier: 


LEMMA 5.15. If q' is obtained from q through dissociation, then RSP *{q) < RSP* (< 7 '). 
Furthermore, linear queries are still easy for responsibility: 


LEMMA 5.16. For any linear sj-free CQ q, RSP *{q) is in PTIME. 


Proof. The proof is a small modification of the proof for Fact 3.18 As before, we 
use network flow to compute the min cut over all w extending any element of (r) of the 
network, A w r (q, D). Thi s new ne twork has weight 00 for every edge in w - (r) and 0 
for every edge in (t). See Fig. 13 □ 


COROLLARY 5.17. If q has no triad, then RSP* (< 7 ) can be made linear by using dis¬ 
sociations, and is thus in PTIME. Therefore so is RSP(g). 

We have thus proved our desired dichotomy for responsibility, and as a bonus, we 
have proved it for responsibility with wildcards as well: 


Theorem 5.18 (Responsibility Dichotomy). Let q be an sj-free CQ, and let q' 
be the result of making all fully dominated atoms exogenous. If 'H(q’) contains a triad 
then RSP(q) and RSP*(g) are NP -complete. Otherwise, RSP(q) and RSP*(g) are PTIME. 


It follows from|Lemma 5.17|and|Cor. 5.14|that RSP*(g) = RSP(q) for all sj-free CQ, q. 
Note that it is not at ail clear how one would build a reduction from RSP" {(f) to RSP(g). 
However, our characterization of the complexity of RSP (q) and RSP" (q) gives us this 
result: After all fully dominated atoms are made exogenous, if there is a triad, then 
RSP (q) is NP-complete, thus so is RSP*(< 7 ). If there is no triad, then RSP*(< 7 ) e PTIME, 
thus so is RSP (< 7 ): 


COROLLARY 5.19. For all sj-free CQ q, we have RSP(q) = RSP* (< 7 ). 


5.4. Dichotomy for responsibility with FDs 

Our final theorem is that the dichotomy for responsibility continues to hold in the 
presence of FDs: 
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Theorem 5.20 (FD Responsibility Dichotomy). Let (q; $) be an sf-free CQ 
with functional dependencies. Let UP ■ T) be its closure under induced rewrites, and 
such that all fully dominated atoms of q* are exogenous. If q* has a triad then RSP(< 7 ; <f>) 
is NP -complete. Otherwise, RSP(g;<J)) e PTIME. 


PROOF. Since FDs only make RSP(q) easier, we know that if q* has no triad then 
RSP(q*) is easy, thus so is RSP(g*; $) and thus also RSP(g; $). For the converse, we show 
that the red uction, /, from one of RSP((ja), RSP(7/ rats ). RSP(7/ T ) to RSP(g) which we built in 
|Lemma fx8 alway s produces databases, D', that satisfy $. The proof is almost exactly 


as in 


Lemma 4.8| Note that in the proof of Lemma 5.8[ we use the same reduction in 


all three cases, i.e., no matter if we are reducing from RSP((?a), RSP(<j rats ), or RSP(g T ). □ 


5.5. Using resilience to compute responsibility more efficiently 

We now show that in applications where we wish to find those tuples of highest re¬ 
sponsibility, we can find them more efficiently by computing resilience instead of re¬ 
sponsibility. 

Responsibility provides a me asure of the causal contribution of an in put tuple to 
a query output. In prior work IMeliou et al. 2011 Meliou et al. 2010), in order to 


identify likely causes, we ranked input tuples based on their responsibilities: tuples at 
the top of the ranking are the most likely causes, whereas tuples low in the ranking 
are less likely. Producing this ranking entails computing the responsibility of every 
tuple in the database that is a cause for the query. This is computationally expensive, 
and, ultimately, unnecessary: Since most applications only care about the top-ranked 
causes, we only need to find the set S p consisting of the tuples of highest responsibility. 
Computing the responsibility of other tuples is unnecessary. Using this insight, we can 
employ resilience to compute S p more efficiently than by calculating the responsibility 
of every tuple in the database. 

Even though resilience is strictly easier to compute than responsibility, we can com¬ 
pute S p , the set of tuples of highest responsibility, by repeatedly computing resilience. 
The first observation is that any minimum contingency set for resilience is contained 
in S p . 


PROPOSITION 5.21. As above, let Sp be the set of tuples of highest responsibility for 
a database D satisfying a binary query q. Let T be a minimum contingency set for ( q , D). 
Then all members ofT have maximum responsibility for D \= q, i.e., T C S p . 


Proof. Let q, D, S p , T be as in the statement of the proposition. Let k = |F|. Let t be 
any element of P. Note that L — {t,} is a contingency set of size k— 1 for the responsibility 
of (q, D, t). Suppose for the sake of contradiction that some tuple t' had strictly greater 
responsibility than t. Then there must be a contingency set P' for the responsibility of 
(q, 1). t') such that |P'| < k — 1. However, this means that P' U {t/} is a contingency set 
for the resilience of ( q , D) of size less than k, contradicting the fact that P is a minimum 
contingency set. □ 


Therefore, all tuples in a minimum contingency set for resilience have maximum 
responsibility. However, there may be additional tuples with maximum responsibility 
that are not part of the selected resilience set P. These can also be derived by a simple 
algorithm based on the following observation. 


OBSERVATION 5.22. Let q, D, S p ,T,k be as in the proof of Prop. 5.21 and let t' be 
any tuple in D. Let T' be a minimum contingency set for the resilience of ( q,D — {t'}). 
Then t' e S p iff |r'| = k — 1. Furthermore, if |P'| = k — 1 then V' C S t 


p- 
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Thus, even though responsibility is harder to compute than resilience (Lemma 2.8 1 , 
the following algorithm computes the set of tuples of maximum responsibility by re- 
peatedly computing resilience. 

Algorithm 5.23 (Computing max responsibility set, S p , using resilience). 

(1) Let C be the set of causes of D \= q 

(2) Let F be a minimum contingency set for ( q, D) 

(3) k := |T|; S :=T 

(4) for each c eC-S: 

(5) Let T' be a minimum contingency set for ( q , D — {c}) 

(d) if in = k - 1: S:=Suru{c} 

(7) return(S) 

6. RELATED WORK 

Sections 1 and [2] have extensively discussed prior work and the connections between 


Sections 1 and [2] have extensively discussed prior work and the connections between 
resilience, deletion propa gation and responsib ility [Bu neman et al. 2002[ Cong et al. 
|2012 Kimelfeld 2012t [Kimelfeld et al. 2012| . In this section, we discuss additional 
related work. 


Data provenance. Data provenance studies formalisms that can characterize the 


relation between the input and the output of a g iven query [Bunema n et al. 2001 


Cheney et al. 2009 Cui et al. 2000 |Green et al. 2007) . Among the kinds of provenance. 


“Why-provenance” is the most closely related to resilience in databases. The motivation 
behind Why-provenance is to find the “witnesses” for the query answer, i.e., the tuples 
or group of tuples in the input that can produce the answer. Resilience, searches to 
find a minimum set of input tuples that can make a query false. 

View updates. The view update problem is a classical problem studied in the 


database literature I Bancilhon and Spyratos 1981 Cong et al. 2012; Cosmadakis and 
Papadimitriou 1984||Dayal and Bernstein 1982; Fagin et al. 1983 Keller 19851. In its 

Id 1 


general form, the problem consists of finding the set of operations that should be ap¬ 

plied to the database in order to obtain a certain modification in the view. Resilience 
and deletion propagation are a special cases of view updates. 

Causality. The study of causality is important in many areas other than databases, 
for example in Artificial Intelligence and philosophy. Although an intuitive concept, 
it is difficult to formally define causality and many authors have presented possible 
definitions of causality. In our prior work, the notions of causality and responsibility 


were strongly inspired by the work of Halpern and Pearl [Chockler and Halpern 2004 


Halpern and Pearl 20051. Causal reasoning is based on the idea of interventions: un- 


derstand how changes of input variables affect an outcome, and thus relates in spirit 
to resilience . In the case of resilience, the intervention is the deletion of input tuples. 
In |Section 7| we provide some additional discussion on how resilience can address some 
applications of causality, and it has the benefit that it is easier to compute than respon¬ 
sibility. 

Explanations in Databases. Providing explanations to query answers is impor¬ 
tant because it can help identify inconsistencies and errors in the data, as well as 
understand the data and queries that operate on it. Causality can provide a frame¬ 
work for explanations of query results [Meliou et al. 2010| Meliou et al. 20111, but it 
relies on the computation of responsibility, which is a harder problem than resilience. 
Other wo rk on explanations also applies interventi ons, but on the queries instead of 
the data [Roy and Suciu 2014; Wu and Madden 20131. These approaches, try to under¬ 
stand how the deletion, addition, or modification of predicates may affect the result of a 
query. There are also other approaches on deriving explanations that focus on specific 
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database applications | Agarwal et al. 20071 |Ba rman et al. 2007; Bend er et al. 2014 
Fabb ri and LeFevre 201lf Khoussainova et al. 2012 Thirumuru ganatha n et al. 2012) 
Finally, the problem of explaining missing query results [Chapman and Jagadish 2009; 
Herschel and Hernandez 2010; Huang et al. 2008 Herschel et al. 2009||Tran and Chan 


2010) is a problem analogous to deletion propagation, but in this case, we want to add, 

rather than remove tuples from the view. In this paper, we focused the definition of 
resilience with respect to tuple deletions; extending it to handle other kinds of updates 
is the topic of future work. 


7. DISCUSSION AND OUTLOOK 

Summary. This paper presents dichotomy results for the resilience and responsibility 
of sj-free conjunctive queries. Our results extend and generalize previous complexity 
results on the problem of deletion propagation with source side-effects and causal re¬ 
sponsibility. 

Approximation for resilience of sj-free conjunctive queries. The dichotomy 
results we establish in this work define sets of queries for which we can solve resilience 
in polynomial time, and sets of queries for which the problem is NP-complete. We can¬ 
not hope to find an efficient algorithm for the latter, unless P = NP, but we can look 
for an approximation for the optimal solution. In particular, a constant factor approx¬ 
imation mig ht be also u seful for finding a good approximation for the responsibility 
problem (see Section 5.5 1 . 

Conjunctive queries with self-joins. In order to complete the study of the com¬ 
plexity of resilience for conjunctive queries, we need to investigate the complexity of 
queries with self-joins. It is known that the problem is NP-complete for a query as sim¬ 
ple as qS(x), R(x, y), S(y) JMeliou et al. 20101. We suspect that the insights using 
triads to characterize the complexity of resilience in the absence of self-joins may still 
be useful in the presence of self-joins. 

Unions of conjunctive queries. It would also be quite interesting to understand 
the complexity of computing the resilience for queries that are unions of conjunctive 
queries, i.e., disjunctions of conjunctions. This is a natural extension which we started 
to explore when trying to generalize our results about resilience to responsibility. In 
particular, there is a natural way to view the responsibility of a query as the resilience 
of a union of related queries. 
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A. NOMENCLATURE 
Notation table 

D database instance, union of all tuples in the relations, i.e., D = |J, Ri 

Ai,..., A m atoms 

A*, A* endogenous or exogenous atom 
D n set of endogenous tuples: D n C D 

D x set of exogenous tuples: D x = D\D n 

D |= q q is true in D 

D \/= q q is false in D 

T contingency set: subset of endogenous input tuples. F C D" 

t tuple 

RES(ly) the resilience problem of query q 

RSP(g) the problem of causal responsibility for query q 
DPsource(<?) deletion propagation with source side-effects 
DP V ieu(<j) deletion propagation with view side-effects 
qA triangle query q A R{x, y), S(y, z), T(z, x) 

q T tripod query q T A(x), B(y), C(z), W(x , y, z) 

g ra ts rats query g ra t s A(x), R(x, y), S{y, z),T(z, x) 

g b rats brats query gb rats A(x), R(x, y), B(y), S(y, z),T(z, x) 

ip, $ a functional dependency (FD), or a set of FDs 

H dual hypergraph (or simply hypergraph, in short) 

q* closure of q under induced rewrites 

var(Ai) set of all variables occurring in atom Ai 

var(g) set of all variables occurring in query q 

T triad 

dom( D) set of domain elements of D 

(ab ) concatenated new domain values 

r tuple with wildcards 

RSP*(g) generalization of RSP(g) that computes responsibility of tuples with wildcards r 




